CS Knowledge Hub

# Human-Computer Interaction Seminar: Rethinking the AI-UX Boundary for Designing Human-AI Experiences **Video Category:** Technology / Human-Computer Interaction (HCI) / Product Design ## ð 0. Video Metadata **Video Title:** Human-Computer Interaction Seminar: Rethinking the AI-UX Boundary for Designing Human-AI Experiences **YouTube Channel:** Stanford Center for Professional Development **Publication Date:** October 22, 2021 **Video Duration:** ~61 minutes ## ð 1. Core Summary (TL;DR) The traditional software engineering approach of separating User Experience (UX) design from backend engineering fails when building AI-powered applications because AI behaviors cannot be fully specified at the interface level. This strict boundary creates "knowledge blindness," where engineers build models disconnected from user needs and designers create interfaces disconnected from technical limitations. To solve this, organizations must adopt a co-creation process using "leaky abstractions"âsharing raw data, model outputs, and user scenarios across disciplinesâand utilize Model-Informed Prototyping tools to design and evaluate AI experiences simultaneously. ## 2. Core Concepts & Frameworks * **Concept:** Human-Centered AIX (AI Experience) Stack -> **Meaning:** A layered framework for AI application design that extends beyond the GUI. It includes four interconnected layers: User Interface & Interactions, AI Behavior (performance, learnability), Implementation, and Training Data (collection, labeling, privacy). -> **Application:** Used as an analytical lens to ensure that user needs, such as task expectations and privacy concerns, are embedded all the way down to how training data is labeled, rather than just how the UI is drawn. * **Concept:** Leaky Abstractions -> **Meaning:** The deliberate practice of sharing ad-hoc, low-level representational artifacts across strict disciplinary boundaries (e.g., engineers sharing raw model JSON outputs or computational notebooks; designers sharing qualitative codebooks or Wizard-of-Oz storyboards). -> **Application:** Serves as a communication bridge to alleviate "Technology Blindness" in designers and "End-User Blindness" in engineers, enabling them to negotiate design trade-offs before committing to code. * **Concept:** Data Personas & Data Probes -> **Meaning:** Expanding traditional user personas by attaching a representative sample of raw data that the specific user type would generate or consume in the real world. -> **Application:** Used during cross-functional "cognitive walkthroughs." By looking at a specific persona's data (e.g., a parent's blurry, duplicate photos of a moving child), designers and engineers can collaboratively brainstorm exact model specifications (clustering by similarity) and UI fallbacks (handling blurry edge cases). * **Concept:** Model-Informed Prototyping (MIP) -> **Meaning:** A prototyping workflow where real machine learning models are run against real datasets, and the resulting outputs (including confidence scores and errors) are injected directly into the UI design canvas. -> **Application:** Allows designers to instantly preview a UI design across hundreds of data instances simultaneously, visually highlighting where the AI fails so they can design specific "error states" or "mixed-initiative" interventions. ## 3. Evidence & Examples (Hyper-Specific Details) * **[Conventional Software vs. AI Specification Example]:** The speaker contrasts designing a traditional 4-digit PIN unlock screen with designing a Face ID unlock screen. For the PIN, UX fully specifies the interaction (keypad layout, success/fail states) and hands it to engineering. For Face ID, the UI is just a lock icon; the actual "design" involves specifying lighting constraints, angle tolerances, and demographic representation in the training dataâelements that cannot be communicated through standard UI wireframes. * **[Technology Blindness / "AI-First" Failure Example]:** The Twitter automated image cropping algorithm (saliency model) was cited as an example where an "AI-First" engineering process created a capability without UX input, resulting in an interface that lacked manual cropping affordances. This over-trust in the AI led to public failures regarding racial bias. The speaker noted that retroactively fixing these issues requires costly rework and causes real-world harm, referencing the "Gender Shades" paper (Buolamwini & Gebru, 2018). * **[End-User Blindness Example]:** During a video analysis project, engineers separated from UX were tasked with building a model. Lacking user context, the engineers experienced "end-user blindness" and wasted time attempting to build capabilities to identify irrelevant objects (like guns in the background) simply because the machine learning could do it, rather than focusing on the actual user task. * **[Leaky Abstraction Example - Movie Recommendations]:** A UX team designing a movie recommendation system (similar to Netflix) used a leaky abstraction by providing engineers with user viewing history data tied to specific time-of-day contexts. The engineers then built a functional subset of the model and exposed the raw output logic to the UX team, explaining how the weights were set. This allowed the UX team to generate accurate "Because you watched..." explanations for the end-user. * **[Data Personas in Photo Decluttering Study]:** In a lab study with 20 participants, teams were asked to build an AI to declutter photo albums. They were provided with four "Data Personas": "3D Rudy" (a grad student), "Brad the Dad", "Vegan Instagrammer", and "Corporate Margret." Crucially, each persona included 15 actual photos representing their typical camera roll. Looking at "Brad the Dad's" sequential, blurry photos of his kids prompted engineers to suggest "clustering based on similarity," while UX designers realized they needed to design an uncertainty state for photos where someone might appear not to be smiling because they had a stroke. * **[ProtoAI Bird Classification Demonstration]:** The speaker demonstrated "ProtoAI," a custom tool. A designer uploads a CSV of bird images and ground-truth labels. An Image Classification model runs against the CSV, generating predicted labels and confidence scores. The designer maps these data elements onto a mobile phone wireframe. * **[ProtoAI Faceted Evaluation Demonstration]:** In the ProtoAI tool, the designer switches to the "Data Previews" tab. The tool renders the UI for every bird image in the dataset. Images where the model's prediction does not match the ground-truth label are automatically flagged with an "Error" tag. The designer notices the model struggles (confidence < 60%) on certain birds. They use the tool's logic builder to create a new UI state: `IF [classify_acc] is less than [60], THEN [Show 'LowAccuracy' State]`. The new state replaces the single label with a checklist of the top 3 alternative labels and a text box for the user to "Enter your own," demonstrating a mixed-initiative design response to AI uncertainty. ## 4. Actionable Takeaways (Implementation Rules) * **Rule 1: Dissolve the strict design-engineering hand-off boundary.** -> Do not allow UX designers to create pixel-perfect UI specs based on assumed AI capabilities, and do not allow engineers to build models without an end-user task model. Instead, mandate parallel "Vertical Prototyping" where the User Interface and the Model API are co-developed iteratively. * **Rule 2: Mandate "Leaky Abstractions" to establish common ground.** -> Require engineering teams to expose raw model outputs, confidence scores, and feature weights to the design team. Require design teams to share raw user research data, qualitative codebooks, and storyboards directly with the engineering and data labeling teams. * **Rule 3: Use Data Personas to drive feature definition.** -> When defining a new AI capability, do not brainstorm in the abstract. Create specific user profiles and attach real, representative raw data to those profiles. Use this data as the central artifact during cross-functional meetings to decide what the model should actually optimize for. * **Rule 4: Design explicitly for AI uncertainty and failure.** -> Assume the AI will produce low-confidence or erroneous outputs. Use tools or spreadsheets to simulate these errors during the wireframing stage. Design "mixed-initiative" interfaces (like ambiguity widgets or fallback manual inputs) to keep the user in control when the model fails. * **Rule 5: Implement Model-Informed Prototyping (MIP).** -> Stop testing UI wireframes with static, "happy path" placeholder data. Inject real model predictions based on real user data directly into your prototypes to discover interaction breakdowns before deploying to production. ## 5. Pitfalls & Limitations (Anti-Patterns) * **Pitfall:** Designing conventional "Screen-to-Screen" flows for AI. -> **Why it fails:** AI introduces a highly open probability space with variable confidence levels. A static flow cannot account for the edge cases, False Positives, and False Negatives the system will inevitably generate. -> **Warning sign:** Design documents that assume the AI will always return the correct answer instantly, with no UI states defined for low confidence, system explanations, or user correction. * **Pitfall:** The "AI-First" Workflow. -> **Why it fails:** Engineers build machine learning capabilities in a vacuum based on available datasets, then hand the model to a product team to "wrap a UI around it." This results in models that solve the wrong problem, fail on diverse demographic data, or conflict with actual user interaction expectations. -> **Warning sign:** UX designers are brought into a project only after the model architecture and training data have been finalized. * **Pitfall:** Treating Training Data as "Non-Technical" or purely an engineering concern. -> **Why it fails:** If UX is not involved in defining the data requirements or annotation guidelines, the resulting model will optimize for metrics that do not align with the end-user's actual needs or mental models. -> **Warning sign:** Design teams complain that they were not invited to data annotation sessions, or engineers state that "the machine will figure it out" when asked about edge cases. ## 6. Key Quote / Core Insight "They would hand that design document off to the engineer and say, 'Implement this.' And of course, my reaction to this was 'This is garbage.' This does not reflect the appropriate architecture for implementing this thing. It felt particularly extraneous when it got very granular, and it was not the best medium for describing the desired behavior." *(Rewritten for impact: The traditional practice of UX handing a completed, pixel-perfect design specification to engineering is fundamentally broken for AI systems. Because AI behavior is probabilistic and deeply tied to underlying data architecture, granular UI specs are essentially 'garbage' unless they are co-designed alongside the model's capabilities and limitations.)* ## 7. Additional Resources & References * **Resource:** *Gender Shades: Intersectional accuracy disparities in commercial gender classification* (Buolamwini, J., & Gebru, T., 2018) - **Type:** Academic Paper - **Relevance:** Cited as a primary example of why the "AI-First" workflow fails, resulting in models that lack demographic representativeness and cause harm because UX/Human-Centered perspectives were not included early in the data and design phase. * **Resource:** ProtoAI - **Type:** Prototyping Tool (Academic Prototype) - **Relevance:** The custom tool developed by the researchers (Subramonyam et al., IUI '21) to enable Model-Informed Prototyping (MIP), allowing designers to evaluate UI designs against actual machine learning model outputs and data sets. * **Resource:** RunwayML - **Type:** Tool/Website - **Relevance:** Mentioned as an example of a "Model Exploration" tool that is useful for testing AI models, but lacks the ability to evaluate how those models function within a user interface. * **Resource:** Human-AI Interaction Design Guidelines (Microsoft, Google, Apple) - **Type:** Corporate Frameworks - **Relevance:** The researchers analyzed 280 of these industry guidelines to map out the theoretical best practices for designing AI experiences across UI, Behavior, Implementation, and Data layers.