The State of Design Knowledge in Human-AI Interaction

📂 General
# The State of Design Knowledge in Human-AI Interaction **Video Category:** Human-Computer Interaction / AI Design ## 📋 0. Video Metadata **Video Title:** The State of Design Knowledge in Human-AI Interaction **YouTube Channel:** Stanford Center for Professional Development **Publication Date:** March 1, 2024 **Video Duration:** ~1 hour 15 minutes ## 📝 1. Core Summary (TL;DR) The field of Human-AI Interaction is rapidly deploying AI features based on reasonable-sounding but fundamentally unverified assumptions that often fail in practice. While some robust design patterns like Split User Interfaces have proven successful, common approaches like predictive text, simple explainable AI, and counterfactual recourse frequently backfire by altering user intent, inducing dangerous over-reliance, or ignoring real-world constraints. To build effective systems, designers must abandon superficial heuristic-based AI integration in favor of empirically validated, sociotechnical design knowledge that forces deep cognitive engagement and accounts for complex systemic contexts. ## 2. Core Concepts & Frameworks * **Split User Interfaces:** -> **Meaning:** A design pattern where an AI-powered, highly efficient method for performing a task is offered as an *alternative* alongside the traditional, robust manual method, rather than replacing it. -> **Application:** Font selection menus where the AI predicts and surfaces a small list of highly probable fonts at the top, while the full alphabetical list remains accessible below for users who wish to bypass the AI. * **Loss Aversion in AI:** -> **Meaning:** A cognitive bias where users perceive the cost and frustration of an AI system's mistakes as significantly larger than the benefits gained when the AI functions correctly. -> **Application:** Because users severely penalize AI errors, AI interventions must offer massive, asymmetrical benefits to overcome the perceived cost of their inevitable inaccuracies. * **Cognitive Forcing:** -> **Meaning:** An intervention applied at the moment of decision-making designed to disrupt a user's superficial, heuristic processing of information and force them into deeper, analytical engagement. -> **Application:** Requiring a user to explicitly record their own initial decision or hypothesis before the system reveals the AI's recommendation and explanation. * **Incidental Learning:** -> **Meaning:** Unintentional learning that occurs as a byproduct of a user interacting with a system to complete a task. -> **Application:** Used as a proxy metric to measure true "cognitive engagement" with an AI; if a user performs better on a task *without* the AI after having used the AI previously, it proves they analytically engaged with the AI's explanations rather than just blindly accepting them. * **Algorithmic Recourse / Counterfactuals:** -> **Meaning:** Explanations provided to a user after an algorithmic rejection that dictate exactly how they must change their features to achieve a positive outcome (e.g., "If your income was $10,000 higher, you would be approved"). -> **Application:** Used in automated loan approvals or public benefit systems to theoretically provide applicants with an actionable path to reverse a denial. ## 3. Evidence & Examples (Hyper-Specific Details) * **Adaptive User Interfaces (Microsoft Smart Menus):** Microsoft attempted to solve "feature bloat" in Word by creating menus that automatically hid rarely used features. It failed because while most people only use 20% of features, every individual uses a *different* 20%. The system violated predictability. This led to the successful "Split User Interface" pattern seen in modern predictive keyboards, which offer suggestions but do not remove the standard QWERTY keyboard. * **Predictive Text Impact on Content (Image Captioning Study):** Researchers (Arnold, Chauncey, Gajos, IUI 2020) asked users to caption images of baseball players and trains. Users equipped with a predictive text engine wrote significantly shorter captions, routinely skipping descriptive adjectives and adverbs (e.g., omitting the word "outdoor"). Users also substituted their intended words for the AI's suggestions (e.g., using "on" instead of "approaching" because the AI suggested it), proving predictive text changes *what* people write, not just *how fast* they write it. * **Predictive Text Bias (Restaurant Review Study):** Researchers (Arnold, Chauncey, Gajos, GI 2018) asked users to recall four recent restaurant experiences (two positive, two negative) and assign them star ratings. Users then wrote the reviews using a predictive keyboard. Unbeknownst to them, the keyboard was trained on either a heavily positive or heavily negative Yelp corpus. Independent evaluators blindly rated the resulting text. Reviews written with the positive-slanted AI were rated as having significantly higher positive sentiment than those written with the negative-slanted AI, despite the users reviewing the exact same underlying experiences. * **Explainable AI in Medical Decisions (Antidepressant Selection):** A study tested clinicians making antidepressant selections alone, AI alone, and Clinicians + AI with explanations. The combined team performed *worse* than the AI alone. Explanations did not reduce over-reliance; clinicians used the mere physical presence of a detailed, factual-sounding explanation as a "heuristic for competence." They assumed the AI was smart because it provided data, and accepted incorrect AI suggestions without analytically engaging with the explanation's actual content. * **Cognitive Forcing (Nutrition Study):** Buçinca et al. (CSCW 2021) asked users to identify which of two meals had more protein. They compared a baseline (no AI), Simple Explainable AI, and an "Update Design" (Cognitive Forcing). In the Update Design, users had to select an answer *before* seeing the AI recommendation, then were allowed to update their choice. The Cognitive Forcing condition was the *only* condition that resulted in incidental learning and a significant reduction in over-reliance on incorrect AI suggestions. * **Algorithmic Recourse (Career Counselor Study):** Upadhyay et al. tested how to help students who were denied an internship. The AI provided either a Counterfactual ("take 2 more pharmacology courses") or Reason Codes (ranking features by importance: 1. Pharmacology, 2. Anatomy). Because courses were offered in different semesters (a hidden constraint the AI didn't understand), the counterfactual path actually required 3 extra semesters. The Reason Codes condition allowed users to see that Anatomy was also important, allowing them to construct an alternative schedule that only required 1 extra semester. * **Public Service Benefits (Boston Housing / Chennai Land):** Karusala et al. (CHI 2024) studied how people navigate algorithmic public services. They found that Counterfactuals are useless because the real barriers occur before the algorithm even runs: people don't know they are eligible, lack the prerequisites (like ID cards), or require "accompaniment" (social workers) to navigate the bureaucracy. Furthermore, when denied, people don't want to change their features; they want to argue that their specific context warrants an *exception* to the algorithmic rule. ## 4. Actionable Takeaways (Implementation Rules) * **Rule 1: Implement Split Interfaces for Automation** - When using AI to improve efficiency, do not hide or replace the standard manual workflow. Provide the AI-generated shortcut (like suggested files, fonts, or predicted text) in a distinct, unobtrusive UI layer while maintaining the fully manual direct-manipulation option for users operating on "autopilot." * **Rule 2: Force Friction in High-Stakes AI Support** - Do not default to showing AI recommendations and explanations simultaneously with the problem. To prevent heuristic over-reliance, force the user to commit to an initial decision, or answer specific guiding questions regarding the data, *before* revealing the AI's conclusion. * **Rule 3: Expose Reason Codes Over Single Counterfactuals** - When explaining algorithmic rejections, do not prescribe a single "if-then" path for the user to fix their application. Instead, provide "Reason Codes" that rank the variables by importance, allowing the user to map those variables against their own hidden constraints (schedules, finances, childcare) to find an optimal path forward. * **Rule 4: Design for Exceptions and Contestability** - In expert domains (medicine, public policy), do not design AI to just output definitive answers. Design the UI to help users explore "what-if" scenarios, manipulate variables interactively, and build arguments for why a specific case should be treated as an exception to the AI's baseline rule. * **Rule 5: Use Incidental Learning as your Success Metric** - To verify if your Explainable AI actually works, test users on a related task without the AI after they have used your system. If their unassisted performance does not improve, your UI design is fostering superficial heuristic processing, not true cognitive engagement. ## 5. Pitfalls & Limitations (Anti-Patterns) * **Pitfall:** Assuming predictive text only impacts typing speed. -> **Why it fails:** Users subconsciously accept AI suggestions that are "close enough" to their intent to save cognitive effort, substituting their specific vocabulary for the model's generalized, statistically probable (and potentially biased) vocabulary. -> **Warning sign:** User output becomes shorter, stripped of descriptive nuance (adjectives/adverbs), and mirrors the tonal slant of the training data. * **Pitfall:** Believing simple Explanations (XAI) increase critical thinking. -> **Why it fails:** Humans are cognitive misers. They do not read the explanation analytically; instead, they treat the sheer volume of "factual-sounding" data points as a heuristic that proves the AI is highly competent, leading to blind trust. -> **Warning sign:** Users routinely agree with the AI even when the AI suggests objectively incorrect or harmful actions, yet report high confidence in the system. * **Pitfall:** Relying on Counterfactuals for algorithmic recourse. -> **Why it fails:** The algorithm assumes it has perfect knowledge of a user's life. It prescribes a path that may be mathematically optimal but practically impossible due to hidden human constraints, while actively blinding the user to viable alternative solutions. -> **Warning sign:** Users receive actionable instructions to reverse a denial, but fail to execute them because the instructions conflict with real-world logistics (e.g., time, money, schedules). * **Pitfall:** Applying Cognitive Forcing to expert clinicians. -> **Why it fails:** Forcing a doctor to "guess" a diagnosis before seeing the AI data violates their professional workflow and desire for collaborative, shared decision-making with the system and the patient. -> **Warning sign:** High rates of system abandonment, user frustration, and complaints that the software feels like a "test" rather than a tool. ## 6. Key Quote / Core Insight "The rapid deployment of AI is driven by reasonable-sounding but fundamentally untested assumptions. When we elevate these unverified assumptions to design axioms, we build systems that actively degrade human decision-making, alter human intent, and induce dangerous over-reliance." ## 7. Additional Resources & References * **Resource:** "Are We All in the Same Bloat?" by Joanna McGrenere - **Type:** Paper - **Relevance:** Foundational research explaining why simply hiding unused software features fails, leading to the Split UI concept. * **Resource:** "Predictive text encourages predictable writing" by Kenneth Arnold, Krysta Chauncey, Krzysztof Gajos (IUI 2020) - **Type:** Paper - **Relevance:** Empirical proof that predictive text shortens user writing and alters word choice. * **Resource:** "Sentiment bias in predictive text recommendations results in biased writing" by Kenneth Arnold, Krysta Chauncey, Krzysztof Gajos (GI 2018) - **Type:** Paper - **Relevance:** Demonstrates that training data bias in predictive models actively changes the sentiment of human output. * **Resource:** "To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI..." by Zana Buçinca et al. (CSCW 2021) - **Type:** Paper - **Relevance:** Introduces the "Update Design" and proves that forced friction is required to generate true cognitive engagement with AI. * **Resource:** "Countering Counterfactuals: Challenging a Paradigm in Algorithmic Recourse" by Sohini Upadhyay, H. Lakkaraju, K. Gajos - **Type:** Working Paper - **Relevance:** Proves that Reason Codes are vastly superior to Counterfactuals when users have hidden real-world constraints. * **Resource:** "Understanding Contestability on the Margins: Implications for the Design of Algorithmic Decision-making in Public Services" by Naveena Karusala et al. (CHI 2024) - **Type:** Paper - **Relevance:** Highlights that algorithmic recourse fails to address the actual, pre-algorithmic barriers faced by marginalized populations. * **Resource:** "Evaluative AI: Is Dead, Long Live Explainable AI" by Tim Miller (FAccT 2023) - **Type:** Paper - **Relevance:** Suggests moving away from single-decision recommendations toward systems that present balanced pros and cons. * **Resource:** "Geek Heresy: Rescuing Social Change from the Cult of Technology" by Kentaro Toyama - **Type:** Book - **Relevance:** Referenced regarding the concept that the best technological interventions are often indirect (e.g., supporting the human "accompaniment" rather than the end-user directly).