Computer Systems Colloquium: Big Data as Both a Window and a Mirror

📂 General
# Computer Systems Colloquium: Big Data as Both a Window and a Mirror **Video Category:** Technology, Human-Computer Interaction (HCI), and Data Science ## 📋 0. Video Metadata **Video Title:** Computer Systems Colloquium: Big data as both a window and a mirror **YouTube Channel:** Stanford Center for Professional Development **Publication Date:** October 11, 2013 **Video Duration:** ~1 hour 14 minutes ## 📝 1. Core Summary (TL;DR) The vast amounts of personal data generated through social media, blogs, and wearables are predominantly used as a "window" for systems and corporations to observe, model, and monetize user behavior. However, this same data can be repurposed as a "mirror" to facilitate personal reminiscing, helping individuals reflect on their past, manage current problems, and strengthen relationships. Designing effective systems for personal reflection requires shifting away from task-based computing toward "everyday computing" that respects the nuances of privacy, the positivity bias inherent in social media, and the limitations of algorithmic interpretation. ## 2. Core Concepts & Frameworks * **Concept:** Personal Data as a Window -> **Meaning:** The traditional paradigm where algorithms, systems, and researchers analyze user-generated data to understand patterns, predict behavior, or build features. -> **Application:** Wikipedia's SuggestBot analyzing a user's past edits to recommend new articles they might want to work on. * **Concept:** Personal Data as a Mirror -> **Meaning:** An alternative paradigm where a system presents a user's past data back to them, acting as a catalyst for self-reflection and personal insight rather than corporate monetization. -> **Application:** Sending a user an old diary entry or photograph to help them understand how their emotional state or relationships have evolved over time. * **Concept:** Everyday Computing vs. Task-Based Computing -> **Meaning:** Task-based computing involves using a tool to complete a specific objective (e.g., writing a document). Everyday computing involves ambient, non-goal-oriented behaviors woven into daily life. -> **Application:** Checking a Facebook newsfeed or receiving a daily text message prompt to reminisce, which requires low cognitive overhead and no strict completion criteria. * **Concept:** The Positivity Bias in Public Venues -> **Meaning:** The sociological tendency for individuals to only share content that portrays them in a positive, successful, or "special" light when in public forums. -> **Application:** Recognizing that a Facebook timeline is an incomplete historical record because it actively filters out the mundane and negative experiences necessary for accurate self-reflection. * **Concept:** Front Stage vs. Back Stage (Goffman) -> **Meaning:** A sociological framework explaining how people manage their impressions. "Front stage" is public performance; "back stage" is private reality. -> **Application:** Designing private digital spaces (like a private digital diary) where users can safely reflect on public social media posts without fear of judgment from their network. ## 3. Evidence & Examples (Hyper-Specific Details) * **SuggestBot (Window Example):** Built by speaker Dan Cosley. It analyzes what a user has edited on Wikipedia, combines it with knowledge of community needs (e.g., articles needing expansion or formatting), and recommends new tasks to the user. It models the user to benefit the system. * **Twitter Sentiment Analysis (Window Example):** Conducted by Cornell researchers Michael Macy and Scott Golder. They analyzed half a billion tweets globally to map "world happiness." They found negative feelings dip early in the morning and positive feelings rise, demonstrating how data acts as a window into macro human behavior. * **Cosley's Personal Diary (1998):** Cosley started a diary because his ex-wife (Sue) teased him for not remembering their past. He recorded negative events, such as getting a "nastygram" at work, or an incident on 12/20/98 where his younger sister aggressively shoved her Master's diploma in his face. He initially hid the file (renaming it "diary 1998") and never looked at it. * **Pensieve SMS Prototype (2008):** Cosley built a script to text himself snippets of his 1998 diary 4 to 5 times a day (e.g., an entry from 08/20/01 about an internship in New Jersey). This transformed forgotten, often negative data into a tool for understanding his past relationships and personal growth. * **Family Memories Radio (FM Radio):** An example of "everyday reminiscing" design. An old radio was retrofitted to record household sounds and randomly play them back later, triggering spontaneous memories for the family without requiring active screen time. * **Living Memory Box (Georgia Tech, 2003):** A physical box where users place souvenirs (like an Eiffel Tower statue from a Paris trip). The system scans the object and prompts the user to record a video explaining its meaning, ensuring descendants understand the object's context rather than just seeing a random trinket. * **Pensieve Web Application Deployment:** A system built by undergrads that aggregated data from Flickr, Picasa, Blogger, Twitter, and Last.fm. It emailed users prompts containing their past data and provided a private diary space for reflection. In a six-month deployment with 91 users, it sent 11,168 prompts and generated 730 diary entries from 58 writers. * **PieTime Visualization (OJ Zhao and Tiffany Ng):** A project that visualized Gmail history in a pie chart segmented by time. Users found the high-level patterns useless until they could drill down into the specific text. For example, one user only understood a massive spike in emails in November when she saw the specific messages and remembered her mother had been sick. * **"See Friendship" Feature on Facebook:** Research by Victoria Sosik and Xuan Zhao analyzed how romantic partners interacted on Facebook. They found users actively managed privacy settings (e.g., hiding relationship statuses) to balance public performance with private intimacy. * **Renren LoverSpace:** A feature on a Chinese social network specifically designed for couples. It created a private, shared space separate from the main feed, solving the tension between public performance and intimate relationship archiving. * **Expressive Writing Paradigm (Jamie Pennebaker, UT Austin):** Research showing that simply asking people to write about traumatic or significant past events—without any grading or analysis—measurably improves their psychological well-being. * **NPR StoryCorps:** An example of a system that structures social reminiscing. A van travels the US inviting pairs of close individuals (duos/trios) to interview each other, recording a personal conversation that serves as a shared memory artifact. ## 4. Actionable Takeaways (Implementation Rules) * **Rule 1: Push Reminiscing into Existing Workflows** - Do not build standalone destination websites for reflection. Push triggers into spaces users already inhabit, such as email inboxes or SMS text messages, to align with "everyday computing" behaviors. * **Rule 2: Anchor Patterns in Particulars** - When designing data visualizations (like activity graphs or email volumes), always allow the user to click through to the raw, underlying data (the specific photo, email, or post). High-level patterns fail to trigger memories without concrete contextual details. * **Rule 3: Design for Private Reflection on Public Data** - Provide users with a "back stage" private area (like a secure text box attached to a public photo prompt) where they can write honest reflections about a public post without those reflections being broadcast back to their social network. * **Rule 4: Embrace Algorithmic Ambiguity** - Do not try to build algorithms that perfectly predict what a user wants to remember. Instead, use simple, randomized, or time-based triggers (e.g., "one year ago today") and allow the user's brain to construct the meaning. * **Rule 5: Allow De-coupling from "The Stream"** - Recognize that the chronological "news feed" prioritizes recency. To support self-understanding, build interfaces that deliberately surface old, forgotten, or randomly selected data to break the bias of the present moment. ## 5. Pitfalls & Limitations (Anti-Patterns) * **Pitfall: Relying on social media to build a complete psychological profile.** -> **Why it fails:** Users operate under a "positivity bias" and strict self-presentational rules on platforms like Facebook, actively filtering out mundane, sad, or complex moments. -> **Warning sign:** An archive that looks entirely like a highlight reel of vacations and successes, providing no insight into personal struggles or growth. * **Pitfall: Building "special purpose" destinations for everyday tasks.** -> **Why it fails:** Reminiscing is a spontaneous, low-effort background task. If a system requires a user to log in, navigate a new UI, and initiate the task manually, they will abandon it. -> **Warning sign:** High initial account creation followed by zero returning traffic (as seen with the Pensieve website interface). * **Pitfall: Trying to algorithmically determine emotional value.** -> **Why it fails:** Machine learning models lack human context. An algorithm might highlight an aesthetically perfect photo of a landscape, while a blurry, poorly framed photo of a mushroom on a rock holds massive emotional weight for the user. -> **Warning sign:** Users expressing frustration that the system's "best" recommendations feel irrelevant or tone-deaf. * **Pitfall: Forcing reflection into public channels.** -> **Why it fails:** When users know their network is watching, they alter their behavior to perform rather than reflect. For example, users rejected the idea of their private reflections automatically becoming Facebook status updates. -> **Warning sign:** Users self-censoring, using vague language, or abandoning a feature entirely because it feels unsafe or performative. ## 6. Key Quote / Core Insight We pour massive amounts of personal data into digital systems, but currently, that data functions almost exclusively as a "window" for corporations to observe, model, and monetize our behavior. The profound, untapped opportunity of technology is to turn that data into a "mirror"—presenting our own history back to us to foster self-understanding, strengthen relationships, and help us learn from the people we used to be. ## 7. Additional Resources & References * **System:** SuggestBot - **Type:** Software tool - **Relevance:** Wikipedia recommendation engine built by Dan Cosley; a primary example of data acting as a "window." * **Paper/Research:** Twitter Mood Analysis by Michael Macy and Scott Golder (Cornell) - **Type:** Academic research - **Relevance:** Demonstrates how massive datasets act as a window into global temporal and emotional behavior. * **Theory:** Erving Goffman's Presentation of Self (Front stage/Back stage) - **Type:** Sociological theory - **Relevance:** Explains the tension between public social media posts and private personal reflection. * **Research Concept:** Expressive Writing Paradigm (James Pennebaker, UT Austin) - **Type:** Psychological framework - **Relevance:** Validates that the simple act of writing about past experiences improves psychological well-being. * **Hardware:** SenseCam (Microsoft Research, Gordon Bell) - **Type:** Wearable hardware - **Relevance:** Mentioned as a tool that passively captures life logging data (taking photos every 30 seconds).