CS Knowledge Hub

# Human-Centered Explainable AI: From Algorithms to User Experiences **Video Category:** AI Research & Human-Computer Interaction ## ð 0. Video Metadata **Video Title:** Human-Computer Interaction Seminar: Human-Centered Explainable AI: From Algorithms to User Experiences **YouTube Channel:** Stanford Center for Professional Development **Publication Date:** February 17, 2023 **Video Duration:** ~1 hour 21 minutes ## ð 1. Core Summary (TL;DR) The field of Explainable AI (XAI) has traditionally focused on algorithmic capabilities, often failing to address the actual cognitive needs and workflow contexts of end-users. This presentation introduces a human-centered approach that reframes technical XAI methods around specific user questions, moving from exhaustive "data dumps" to targeted, selective explanations. By understanding the dual cognitive processes that lead to blind over-reliance on AI, practitioners can design XAI systems that foster appropriate trust, force meaningful cognitive engagement, and ultimately improve joint human-AI decision-making. ## 2. Core Concepts & Frameworks * **Human-Centered Explainable AI (HCXAI):** -> **Meaning:** A paradigm shift from viewing XAI merely as "interpreting model weights" to broadly "making AI understandable by people." It encompasses the entire user interaction, focusing on diverse stakeholder needs, workflow integration, and the socio-technical environment rather than just the algorithm itself. -> **Application:** Designing interfaces for loan officers or doctors where the explanation answers their specific task-oriented questions rather than showing raw mathematical feature attributions. * **Question-Driven XAI Design:** -> **Meaning:** A four-step iterative design framework that uses user questions as the foundational building blocks. It involves identifying user questions, analyzing them, mapping them to specific XAI technical solutions via a "Question Bank," and iteratively evaluating the design. -> **Application:** Instead of asking a developer to "implement LIME or SHAP," a designer asks, "What if the user changes this input?" and maps that specific user question to a "Counterfactual" XAI algorithm to build a what-if slider in the UI. * **Dual Cognitive Processes in XAI (System 1 vs. System 2):** -> **Meaning:** A psychological framework contrasting "System 2" (slow, analytical, careful thinking) with "System 1" (fast, automatic, heuristic-based thinking). XAI designers often falsely assume users will engage System 2 to read explanations; in reality, under time pressure, users default to System 1. -> **Application:** Explains the phenomenon of "over-reliance," where users treat the mere presence of a complex feature-importance chart as a heuristic for trustworthiness, blindly following the AI even when it is wrong. * **Selective Explanations:** -> **Meaning:** A communication theory principle stating that humans rarely explain events by listing all possible causes; they select the 1-2 causes that are most relevant, abnormal, or changeable. Selective XAI mimics this by filtering algorithmic output based on human intuition. -> **Application:** In a text classification interface, instead of highlighting every word's mathematical weight, the system "grays out" words that contradict human intuition, forcing the user's attention only onto the key signals the model used, making errors instantly visible. * **Contextualized Evaluation of XAI:** -> **Meaning:** The concept that XAI methods cannot be evaluated using universal, context-free metrics. Evaluation criteria (e.g., faithfulness, comprehensibility, compactness) must be selected based on the specific downstream usage context and the user's goals. -> **Application:** Choosing to optimize an explanation for "comprehensibility" when designing a high-speed decision support tool for an emergency room, but optimizing for strict "faithfulness" when building an auditing tool for regulatory compliance. ## 3. Evidence & Examples (Hyper-Specific Details) * **XAI Algorithms vs. XAI Toolkits:** The speaker contrasts raw algorithmic research (e.g., global knowledge approximation, local feature importance, counterfactuals) with the rise of open-source industry toolkits that lower the barrier to entry, specifically naming IBM's AIX360, Microsoft's InterpretML, Captum, Alibi, and Skater. * **The XAI Question Bank (CHI 2020):** Based on interviews with 20 designers working on 16 AI products, Liao's team mapped over 50 common user questions into 9 categories. These categories act as "boundary objects" between designers and data scientists: *Data, Output, Performance, How (global), Why (local), Why Not, How to be that, How to still be this, and What if.* * **Healthcare Adverse Event Risk Prediction (IBM):** A real-world application predicting 30-day risk of all-cause hospital admission. The interface was structured entirely around user questions. A "Why" tab mapped to local feature importance, a "Performance" tab mapped to historical accuracy metrics, and a "How to be that" tab mapped to actionable counterfactuals (e.g., changes the patient could make to lower risk). * **The Pitfall of Over-Reliance (FAccT 2020 Study):** A controlled experiment by Yunfeng Zhang and team using a semi-toy task (predicting if a profile belongs to a high-income group based on tabular data). When the AI model was *wrong*, users shown a standard feature-importance explanation performed *worse* than users shown no explanation at all, proving that complex explanations cause blind trust rather than critical oversight. * **Placebic Explanations (CHI 2020 Study):** Cited research by Eiband et al. demonstrating that showing users an explanation that looks technical but contains no actual semantic value ("placebic explanations") still increased user trust, further validating the danger of System 1 heuristic thinking. * **Feature-Based vs. Example-Based Explanations (Valerie Chen et al.):** In a study comparing explanation types, feature-based explanations (showing variable weights) caused over-reliance when the model was wrong. Example-based explanations (showing nearest-neighbor historical profiles with their ground truth) did *not* cause over-reliance and actually led to "complementary performance" where human + AI outperformed the AI alone. * **Movie Sentiment Task & Selective Explanations (Chenhao Tan's Lab):** A model evaluated movie reviews. When the model incorrectly predicted a negative review as positive, standard highlighting showed relevant words but also irrelevant words like "Western." The researchers introduced a "Selective Explanation" layer that elicited human beliefs about relevant words, and "grayed out" the model's highlights that didn't match human intuition. This selective visual disruption increased the user's ability to spot the model's error from ~35% to ~43%. * **Scenario-Based Evaluation Survey (HCOMP 2022):** A survey across experts and crowdworkers evaluating which XAI metrics matter when. Results showed extreme variance: "Comprehensibility" was rated highly important when efficiency matters (fast decision making), whereas "Translucence" (communicating unreliability) was deemed critical by experts but completely ignored in most current XAI benchmark tests. ## 4. Actionable Takeaways (Implementation Rules) * **Rule 1: Map User Questions to Technical Solutions** - Do not start by choosing an XAI algorithm (like LIME or SHAP). Elicit specific questions from your users (e.g., "Why is this instance predicted to be X?"). Use the XAI Question Bank to map that specific question to the correct algorithmic class (Local Feature Importance) and design the UI to answer that exact question. * **Rule 2: Use Questions as Cross-Functional Boundary Objects** - Use specific user questions as the shared language between UX designers (who understand user intent) and Data Scientists (who understand algorithmic capability). A designer saying "the user needs to know how to change this outcome" gives the data scientist a clear directive to implement Counterfactual generation. * **Rule 3: Design to Disrupt "System 1" Thinking** - Assume users will not carefully analyze complex charts. Do not rely on dense feature attributions. You must design visual interventions that break cognitive heuristics, such as explicitly signaling uncertainty or highlighting anomalies that force the user to slow down and engage "System 2" thinking. * **Rule 4: Utilize Example-Based Explanations for Abstract Tasks** - When users lack the domain expertise to reason about abstract mathematical feature weights, provide example-based explanations. Show them the 3 most similar historical cases alongside their ground-truth outcomes, allowing the user to use inductive reasoning to form their own intuition about the model's logic. * **Rule 5: Implement Selective Explanations to Expose Errors** - Instead of showing all variables the model used, filter the explanation through human intuition. If the model relies heavily on a variable that human experts consider irrelevant, visually isolate or "gray out" that mismatch. This immediately signals an "unreliability pattern" to the user, improving error detection. * **Rule 6: Define Evaluation Metrics by Usage Context** - Before testing your XAI interface, define the downstream context. If the goal is "Model Debugging," optimize for Faithfulness and Completeness. If the goal is "Decision Support" under time pressure, optimize for Comprehensibility and Compactness. Never evaluate XAI using a generic, one-size-fits-all metric. ## 5. Pitfalls & Limitations (Anti-Patterns) * **Pitfall:** Defaulting to exhaustive feature-importance charts (e.g., showing a massive bar chart of all variables). -> **Why it fails:** It causes cognitive overload. Users abandon analytical thinking (System 2), fall back on heuristics (System 1), and assume the model is correct simply because the explanation looks complex and authoritative. -> **Warning sign:** "Over-reliance" â users agree with the AI even when it is blatantly wrong, resulting in worse performance than if they had no AI assistance at all. * **Pitfall:** Treating a user persona as having a single, static explanation need (e.g., "The Manager needs a dashboard"). -> **Why it fails:** A single user's goals change depending on the phase of interaction. A manager checking a system for compliance requires entirely different explanations than a manager trying to contest a specific automated decision. -> **Warning sign:** Users abandon the XAI tools during specific parts of their workflow, complaining the data is irrelevant to their immediate task. * **Pitfall:** Optimizing purely for algorithmic "faithfulness" (how accurately the explanation represents the math of the model). -> **Why it fails:** A perfectly faithful explanation of a deep neural network is often incomprehensible to a human. If the human cannot process the information quickly, the faithful explanation is operationally useless. -> **Warning sign:** High scores on technical XAI benchmarks but complete failure in user studies regarding task completion time or comprehension. ## 6. Key Quote / Core Insight "The mere presence of a complex explanation often acts as a dangerous heuristic for trustworthiness. If we exhaust users with comprehensive data dumps, they stop analyzing and start blindly agreeing. To build truly responsible AI, we must move from exhaustive transparency to selective, human-compatible communication that forces cognitive engagement and exposes when the model is wrong." ## 7. Additional Resources & References * **Resource:** AIX360 (AI Explainability 360) - **Type:** Open-Source Toolkit - **Relevance:** Developed by IBM to provide a comprehensive suite of XAI algorithms for practitioners. * **Resource:** InterpretML - **Type:** Open-Source Toolkit - **Relevance:** Microsoft's open-source library for training interpretable models and explaining black-box systems. * **Resource:** *Thinking, Fast and Slow* by Daniel Kahneman - **Type:** Book - **Relevance:** The foundational psychological framework (System 1 vs. System 2) used to explain the phenomenon of human over-reliance on AI explanations. * **Resource:** "Questioning the AI: Informing Design Practices for Explainable AI User Experiences" (Liao et al., CHI 2020) - **Type:** Research Paper - **Relevance:** The origin of the Question-Driven XAI design process and the comprehensive XAI Question Bank. * **Resource:** "Connecting Algorithmic Research and Usage Contexts: A Perspective of Contextualized Evaluation for Explainable AI" (Liao et al., HCOMP 2022) - **Type:** Research Paper - **Relevance:** Provides the framework for matching XAI evaluation metrics to specific downstream usage contexts. * **Resource:** "Explanation in Artificial Intelligence: Insights from the Social Sciences" (Tim Miller, Artificial Intelligence 2019) - **Type:** Research Paper - **Relevance:** A critical survey highlighting that human explanations are naturally contrastive, social, and selectiveâprinciples often missing from algorithmic XAI.