CS Knowledge Hub

# Mobile, Social, and Fashion: Three Stories from Data-Driven Design **Video Category:** Human-Computer Interaction Research / Data-Driven Design ## ð 0. Video Metadata **Video Title:** Human-Computer Interaction Seminar - Mobile, Social, and Fashion: Three Stories from Data-Driven Design **YouTube Channel:** Stanford Center for Professional Development **Publication Date:** November 30, 2018 **Video Duration:** ~1h 3 minutes ## ð 1. Core Summary (TL;DR) This presentation outlines a research agenda focused on moving beyond purely aesthetic design by using large-scale data to tie design decisions to desired, measurable outcomes. It addresses the limitation that simple design examples lack the causal data needed to understand *why* a design works. By deploying tools for zero-integration performance testing, building massive semantic datasets (like RICO), and utilizing Wizard-of-Oz AI proxies, organizations can democratize design insights, allowing even small teams to make statistically backed decisions across mobile apps, fashion retail, and social media. ## 2. Core Concepts & Frameworks * **Concept:** Data-Driven Design Thinking -> **Meaning:** The integration of large-scale data collection, machine learning design, and deployment directly into the traditional product design cycle (need-finding, ideation, creation, evaluation). -> **Application:** Instead of just A/B testing at the end, data informs the initial ideation and need-finding phases, such as using chatbots to understand user vocabulary before building a recommendation engine. * **Concept:** Zero-Integration Performance Testing (ZIPT) -> **Meaning:** A methodology for capturing usability and interaction data from mobile applications without needing access to or modifying the app's source code or binary. -> **Application:** Using crowd-sourced workers interacting with device-farm phones via a web proxy to test competitor apps (e.g., comparing Best Buy to Macy's) to establish industry baselines. * **Concept:** Design Semantics (Semantic Abstractions) -> **Meaning:** The process of translating low-level structural data (pixels, tap coordinates, raw HTML/DOM) into higher-level, human-understandable concepts (e.g., "login screen", "hamburger menu", "bohemian style"). -> **Application:** Training an Autoencoder to recognize that a login screen on App A is conceptually the same as a login screen on App B, enabling cross-app performance comparisons. * **Concept:** Code-Switching in Design -> **Meaning:** The translation required between the high-level, context-driven language used by consumers and the low-level, structural language used by domain experts or search engines. -> **Application:** A fashion AI must translate a consumer's query for a "beach wedding outfit" into structural search terms like "maxi silhouette", "floral pattern", and "lightweight fabric". ## 3. Evidence & Examples (Hyper-Specific Details) * **Webzeitgeist Project:** An early system built to mine web design at scale by parsing HTML, DOM, visual segmentation, and screenshots. It allowed designers to query specific design patterns, such as "show me coffee websites with large photographic backgrounds" or "where should a mobile signup form be placed?" * **The Limitations of A/B Testing (Qubit Meta-Analysis):** The speaker referenced a white paper analyzing 6,700 large e-commerce experiments. The data showed that 90% of A/B tests had less than a 1.2% effect on revenue. While a 0.4% lift is valuable for a massive company ($15M), it demonstrated that A/B testing requires massive traffic to achieve statistical significance, making it inaccessible for smaller design teams. * **ZIPT and the ERICA System:** To capture data without code integration, the team built ERICA. It streams an Android app from a device farm to a crowd worker's browser. It continuously captures three data streams in the background: user interaction data (tap coordinates, gestures), high-resolution screenshots (including animations), and the UI hierarchy (the DOM tree of the Android app). * **ZIPT Case Study: Calorie Counter App:** Crowd workers were tasked with adding "1 chocolate chip cookie" to a diet app. The ZIPT dashboard aggregated the results: an average time of 01:13 and 11.48 interactions with a 100% completion rate. However, a flow visualization (Sankey diagram) revealed that users deviated from the "Golden Trace" (the designer's intended path) because an unexpected modal popped up requiring users to manually select a serving type, causing usability friction. * **ZIPT Comparison: YouTube Music vs. Spotify:** Task: "Add two songs to a new playlist that you create." The study found an 89% completion rate for YouTube Music versus a 98% completion rate for Spotify. Qualitative feedback revealed the flaw: YouTube Music lacked a button to create an empty playlist first; users had to select a song first and then choose "add to new playlist," which violated user expectations. * **ZIPT Comparison: Macy's vs. Best Buy:** Task: "Find the address of the store closest to 94102." Macy's used a hamburger menu leading to a store locator (6 interactions, 96% completion). Best Buy used a tabbed navigation and search bar (3 interactions, 83% completion). Despite industry trends moving away from hamburger menus, the data showed Macy's approach was more effective for this specific task because Best Buy users got confused trying to use the general product search bar to find physical store locations. * **The RICO Dataset:** To build semantic models, the team created a massive public dataset containing interaction traces for ~10k Android apps, 72k unique UIs, and 3M UI elements. This was generated by paying 13 Upwork workers to explore apps for 10 minutes each, totaling ~2.5k hours of usage at a cost of ~$20,000. * **Android UX Lexicon:** Using RICO, the team categorized 73k icons and 130k text buttons into 197 concepts. They discovered "icon polysemes" (icons with multiple meanings, like a star meaning "rate" or "favorite") and identified concepts that strictly require text labels (e.g., "login", "subscribe") because no standard icon exists for them. * **Fashion Needfinding via Wizard-of-Oz Chatbot:** To understand fashion consumer needs, the team deployed a Facebook Messenger bot manned secretly by fashion students. Over 3 weeks, they collected 88 organic styling conversations from 73 users. They discovered that consumers rarely ask for specific items; instead, they ask context-based questions ("What do I wear to a yoga retreat in Colorado?" or "How do I look professional without looking boring?"). * **Visual Compatibility Model (Polyvore Data):** The team trained a model on 70,000 outfits from the site Polyvore. A key finding was that compatibility cannot be mapped on a single embedding space. Two different pairs of shoes might both perfectly match a specific top, but those two shoes might look completely different from each other. The model required multiple embeddings to allow items to match in different ways. * **Opico.io Social Network:** To solve the problem of sparse "taste data" on networks like Facebook, the team built Opico.io, an app where users check into locations using combinations of 5 emojis. This lowered the friction of curation, allowing the team to map out taste profiles and identify that the best "curators" of content in a network are rarely the "creators" of that content. ## 4. Actionable Takeaways (Implementation Rules) * **Rule 1: Define "preferred outcomes" before collecting design examples.** Do not build mood boards or swipe files without establishing KPIs. Define exactly what metric a design must improve (task completion rate, time on task, bounce rate) before evaluating its utility. * **Rule 2: Map the "Golden Trace" for core user flows.** Before testing an interface, explicitly map the optimal, minimum-interaction path a user should take. Use this as the baseline to measure all user deviations and identify confusing UI modalities. * **Rule 3: Utilize Zero-Integration proxies for competitive analysis.** Do not wait to build a feature to test it. Use crowd-sourcing platforms wrapped around existing competitor apps to gather baseline usability data and learn from their mistakes before coding your own solution. * **Rule 4: Train models on semantic abstractions, not raw pixels.** If building analytics or recommendation tools, use Autoencoders to translate raw app screens into structural categories (e.g., "login layout", "search layout"). This allows for scalable data aggregation across entirely different products. * **Rule 5: Deploy Wizard-of-Oz systems for vocabulary extraction.** Before building a complex AI or search engine, deploy a human-powered chat interface to interact with users. Use this to capture the exact, messy, context-heavy vocabulary users naturally employ, and use that data to structure your backend ontology. * **Rule 6: Design "Code-Switching" into recommendation engines.** Ensure your system acts as a translator between user intent and database structure. If a user inputs a contextual query ("beach wedding"), the system must automatically translate that into the rigid parameters the database requires ("maxi dress", "floral", "light fabric"). ## 5. Pitfalls & Limitations (Anti-Patterns) * **Pitfall:** Treating design repositories (like Dribbble or Pinterest) as definitive solutions. -> **Why it fails:** Examples presented in isolation lack the causal data showing how they actually perform in the wild. -> **Warning sign:** Teams copying a trendy UI element (like removing a hamburger menu) without verifying if it hurts their specific user completion rates. * **Pitfall:** Exclusively relying on A/B testing for design optimization. -> **Why it fails:** A/B tests require massive user traffic to reach statistical significance, and mostly result in marginal gains (less than 1.2%). It is too expensive and slow for foundational design changes. -> **Warning sign:** Startups with low traffic wasting weeks running A/B tests on button colors instead of testing entirely different user flows. * **Pitfall:** Assuming single-dimensional visual similarity equals compatibility. -> **Why it fails:** In complex domains like fashion, items match based on context and style, not just color or shape. A model that groups items based on a single similarity metric will generate terrible recommendations. -> **Warning sign:** A recommendation engine suggesting five nearly identical blue shirts instead of a pair of pants that matches a blue shirt. * **Pitfall:** Expecting consumer users to provide structured search queries. -> **Why it fails:** Consumers think in terms of events, problems, and feelings, while databases require strict taxonomies. -> **Warning sign:** Users failing to find products because they search for "cute date outfit" but the database only accepts "red sleeveless midi dress". ## 6. Key Quote / Core Insight "Everyone designs who devises courses of action aimed at changing existing situations into preferred ones... and the important word here is 'preferred'. Examples by themselves are not enough; we must tie design decisions to desired outcomes." ## 7. Additional Resources & References * **Resource:** Herbert Simon, *The Sciences of the Artificial* - **Type:** Concept Source - **Relevance:** Provided the foundational definition of design as the act of changing existing situations into preferred ones. * **Resource:** Qubit Meta-analysis - **Type:** White Paper - **Relevance:** A study of 6,700 e-commerce experiments proving the limited effect size of traditional A/B testing. * **Resource:** ERICA / ZIPT (Zero-Integration Performance Testing) - **Type:** Research System (UIST 2017) - **Relevance:** The methodology used to test mobile apps without source code access. * **Resource:** RICO Dataset (interactionmining.org/rico) - **Type:** Open-source Dataset - **Relevance:** A massive repository of mobile app interaction traces, UIs, and elements for training ML design models. * **Resource:** Android UX Lexicon (lexicon.eaux.design) - **Type:** Database - **Relevance:** A categorized mapping of UI icons and text buttons across Android applications. * **Resource:** Polyvore - **Type:** Social Commerce Website (defunct) - **Relevance:** The primary data source used to train the visual compatibility model for fashion outfits. * **Resource:** Opico.io - **Type:** Mobile Application - **Relevance:** An emoji-based social network built by the researchers to efficiently mine dense user taste and curation data.