Revealing Data: Creepy or Curious?

📂 General
# Revealing Data: Creepy or Curious? **Video Category:** Human-Computer Interaction Seminar / Data Ethics ## 📋 0. Video Metadata **Video Title:** Revealing Data: Creepy or Curious? **YouTube Channel:** Stanford Center for Professional Development (scpd.stanford.edu) **Publication Date:** March 9, 2018 **Video Duration:** ~1 hour 15 minutes ## 📝 1. Core Summary (TL;DR) The video explores the increasing prevalence of "creepy data" collection, particularly through covert technologies like Emotional AI and facial recognition. To counter this trend, the speaker proposes alternative, open approaches to data collection where the public is actively engaged and curious rather than suspicious. Through physical, interactive data installations, the research demonstrates that making data collection transparent and tangible can foster public participation, shifting the paradigm from passive surveillance to active civic engagement. ## 2. Core Concepts & Frameworks * **Concept:** Breaching Experiments -> **Meaning:** A methodology developed by sociologists Goffman and Garfinkel designed to disrupt accepted codes of conduct and unstated social norms to observe genuine human reactions. -> **Application:** Used to gauge true public sentiment regarding extreme data collection scenarios, such as the "Quantified Toilets" experiment where users believed their bodily waste was being analyzed. * **Concept:** Emotional AI / Subliminal Facial Recognition -> **Meaning:** Artificial intelligence systems that analyze micro-twitches in facial expressions to infer an individual's emotional state, gender, age, and characteristics like honesty or passion. -> **Application:** Deployed covertly in retail settings to measure shopper dwell times and by hiring startups to evaluate job candidates' personalities without their explicit knowledge or consent. * **Concept:** Neurological Privacy -> **Meaning:** A societal concern defined by researcher Lydia Nicholas, referring to the ethical dilemma of technologies inferring internal emotional states, intent, and subconscious thoughts from physiological cues. -> **Application:** Highlights the danger of a "data divide" where users lack control over or understanding of what their bodies are involuntarily revealing to tracking systems. * **Concept:** Physical Ambient Visualizations -> **Meaning:** Tangible devices or installations that collect or display data in the physical environment rather than on traditional digital screens. -> **Application:** Devices like the "PhysiKit" and "Sense-Us" use physical sliders, lights, and rotating discs to make data collection more accessible, understandable, and engaging for the general public. ## 3. Evidence & Examples (Hyper-Specific Details) * **The "Quantified Toilets" Breaching Experiment:** Set up during a CHI workshop in Toronto, researchers created a fake company claiming to analyze urine in public toilets for "public health." Signage stated: "Behaviour at these toilets is being recorded for analysis." A public website displayed fake, real-time data including Toilet ID, Sex, Deposit amount, Odor, Blood alcohol, Drugs detected, Pregnancy, and Infections. The hoax sparked rapid Twitter outrage and articles in *The Atlantic* and *The Washington Post*. Public reactions ranged from disapproval ("Health advice? It does not get any creepier") to voyeurism ("Can't stop watching the pee-pee logs"). The study received top marks for publication but was blocked due to a lack of prior IRB ethics approval. * **Emotional AI in Retail and Hiring:** Retailers use in-store cameras to convert shopper faces into biometric templates to measure responses to specific product displays in milliseconds. In hiring, a London startup called "Human" (founded in 2016) calculates scores for how 'honest', 'nervous', or 'passionate' an applicant is during a video interview. The hiring company (e.g., Stanford looking for a "curious, creative type") receives a compiled report, while the interviewee remains entirely unaware that their micro-expressions are being analyzed. * **Sense-Us (Reinventing the Census):** Deployed in Somerset House, London, this installation reimagined the 100-year-old Census. Researchers built physical voting boxes using sliders, buttons, and dials to collect data on Health, City Life, Trust, and Belonging. Questions included "What blood type are you?" and "What do people do with unwanted gifts?" (near Christmas; results showed nearly half threw them away or sold them). Over 1,000 people engaged over 4 weeks, spending 10-20 minutes answering questions via a smart card system. The physical interface put users "in the zone," and they readily left their physical slider answers visible for others to see. * **PhysiKit (Engaging with Home Data):** Researchers provided 5 families with the open-source "Smart Citizen" sensor kit to measure humidity, light, NO2, and CO. Initial engagement with the web dashboard dropped off within days. Researchers then introduced "PhysiKit" physical cubes: *PhysiLight* (LED matrix), *PhysiMove* (rotating disc), *PhysiAir* (air flow), and *PhysiBuzz* (vibration). Users programmed rules via a tablet. Examples included: a user setting PhysiLight to alert on high air pollution next to their bed; a user placing a basil plant on PhysiMove so it would rotate toward good air quality; and a mother using PhysiLight to show her children how loud they were compared to the neighborhood. * **Roam-io (Crowdsourcing Tourist Data in Madeira):** The Madeira island government (1.5M tourists annually) installed passive WiFi sensors to track MAC addresses, producing data spikes they couldn't interpret. Researchers deployed "Roam-io," an anthropomorphic, 1-meter tall physical kiosk in the capital, Funchal. Over 5 days, 500 people answered >1000 questions (76% English, 34% Portuguese). When asked why Funchal was busy at 10:00 AM, locals explained it was due to the sale of "bolo do caco" (traditional Portuguese bread). When asked why the airport spiked on Mondays, users correctly attributed it to incoming tourists (61%) and business flights (33%), proving the public could interpret data better than the automated system alone. ## 4. Actionable Takeaways (Implementation Rules) * **Rule 1: Use breaching experiments to reveal true attitudes.** To discover authentic public sentiment about data collection, design scenarios that push or violate expected norms rather than relying solely on hypothetical self-reported surveys. * **Rule 2: Make data collection transparent and highly visible.** Combat the "creepy" nature of covert sensors by designing prominent, physical interfaces that clearly indicate what data is being collected and why. * **Rule 3: Design for tangible, physical interaction.** Use physical input controls like sliders, buttons, and dials rather than generic touchscreens. Physicality provokes curiosity, lowers the barrier to entry, and makes the interaction feel like play rather than administration. * **Rule 4: Provide immediate, comparative feedback.** Allow users to see their personal data in the context of aggregate community data (e.g., the Sense-Us visualization station) to trigger self-reflection and spontaneous group discussion. * **Rule 5: Allow users to customize physical data mappings.** Give users the tools to map abstract data (like CO2 or noise levels) to tangible outputs (like a moving disc or vibrating box) based on logical rules they define themselves, fostering a sense of ownership. * **Rule 6: Use crowdsourced context to enrich automated data.** Combine passive, quantitative tracking data (like WiFi flows) with active, qualitative public input to uncover the 'why' behind the 'what'. ## 5. Pitfalls & Limitations (Anti-Patterns) * **Pitfall:** Relying on unseen "subliminal" data collection (Emotional AI). -> **Why it fails:** It removes the user's ability to consent or control what is inferred about their internal state, directly violating neurological privacy and creating a massive power imbalance. -> **Warning sign:** Users express outrage or feel violated when they discover their expressions or biometrics were analyzed after an event (e.g., a job interview). * **Pitfall:** Defaulting to complex web dashboards for citizen science data. -> **Why it fails:** The general public struggles to translate abstract line graphs and charts into meaningful actions regarding their daily lived experience. -> **Warning sign:** High initial setup rates followed by near-zero engagement within the first week of deployment. * **Pitfall:** Conducting extreme breaching experiments without formal ethical oversight. -> **Why it fails:** Even if the research yields highly valuable behavioral insights, the lack of IRB approval renders the data unpublishable in formal academic literature. -> **Warning sign:** Ethics committees and program chairs block publication despite top-tier peer review scores. * **Pitfall:** Assuming passive quantitative data tells the complete story. -> **Why it fails:** Sensors track movement and location but completely miss the cultural, social, or practical reasons for human behavior. -> **Warning sign:** City planners or authorities cannot explain sudden anomalies or spikes in their data dashboards. ## 6. Key Quote / Core Insight "Rather than extracting data subliminally and without consent, we should design highly visible, tangible systems that invite people to become curious data citizens, actively interpreting and reflecting on the information that shapes their environment." ## 7. Additional Resources & References * **Resource:** "What a Toilet Hoax Can Tell Us About the Future of Surveillance" - **Type:** Article (The Atlantic) - **Relevance:** Media coverage highlighting the societal reaction to the Quantified Toilets breaching experiment. * **Resource:** Smart Citizen (smartcitizen.me) - **Type:** Open-source hardware toolkit - **Relevance:** The sensor kit developed by Thomas Diez (Fablab, Barcelona) used as the foundation for the PhysiKit home data experiment. * **Resource:** Intel ICRI (Interdisciplinary Collaboration Research Institute on Urban IoT) - **Type:** Research Institute - **Relevance:** The organization funding the research into sustainable, connected cities at UCL and Imperial College London.