Human-Computer Interaction Seminar: Quo Vadis Augmented Reality?

📂 General
# Human-Computer Interaction Seminar: Quo Vadis Augmented Reality? **Video Category:** Technology Research / Human-Computer Interaction ## 📋 0. Video Metadata **Video Title:** Human-Computer Interaction Seminar Quo Vadis Augmented Reality? **YouTube Channel:** Stanford Center for Professional Development **Publication Date:** October 25, 2019 **Video Duration:** ~45 minutes ## 📝 1. Core Summary (TL;DR) This seminar explores the gap between the promise of Mixed Reality (MR) to replace resource-intensive physical travel and its current practical limitations. The speaker argues that while display and tracking technologies have improved significantly due to massive industry investment, the user interface and physical interaction layer remain largely unsolved, leading to an "uncanny valley" of interaction. To solve this, the presentation details four specific research projects focusing on non-optical sensing (soft data gloves and IMU-based body tracking), intent-driven UI adaptation using gaze data, and dense haptic feedback to make virtual objects feel real. ## 2. Core Concepts & Frameworks * **The Uncanny Valley of Mixed Reality:** -> **Meaning:** The phenomenon where a virtual object looks highly realistic but lacks the expected physical interaction or context, creating a jarring, nightmare-like cognitive conflict for the user. -> **Application:** Used as a framework to evaluate MR systems; if a user can see a virtual sheep but their hand passes right through it, the visual fidelity actually hurts the experience because the physical expectation is violated. * **Data-Driven Pose Estimation:** -> **Meaning:** A machine learning approach to reconstruct human body or hand positions by mapping raw sensor data (like material stretch or acceleration) to 3D joint coordinates using neural networks, rather than relying on direct optical line-of-sight cameras. -> **Application:** Used to create wearable sensors (like soft gloves or IMU straps) that can track user movements even when heavily occluded (e.g., reaching into a pocket or behind a physical object). * **Context-Aware Online Adaptation:** -> **Meaning:** An interface design paradigm where the system dynamically shows or hides information (like labels) based on the user's inferred intent, minimizing visual clutter. -> **Application:** Applied via a Semi-Markov Decision Process that analyzes a user's gaze trajectory to predict what virtual object they are interested in, revealing details only for that specific object. * **Bistable Haptic Actuation:** -> **Meaning:** A mechanical design using a permanent magnet and latching plates that requires power only to switch states, consuming zero power to maintain either an "open" or "closed" state. -> **Application:** Used to build dense, low-power haptic feedback arrays on wearable gloves, capable of rendering both brief textures (high-frequency switching) and permanent physical contact (latching). ## 3. Evidence & Examples (Hyper-Specific Details) * **2003 "Virtual Sheep" Mixed Reality Demo (TU Munich):** The speaker demonstrated an early MR system featuring virtual sheep reacting to physical objects (Duplo sheep) using a tangible user interface and a head-mounted display. When shown to Thad Starner, he warned not to "overdo the weird stuff," and the speaker's sister remarked that the cognitive disconnect between seeing the sheep and not feeling them gave her "nightmares." * **Microsoft HoloLens 2 Reveal Demo (Julia Schwartz):** A video clip was shown demonstrating a user's visceral reaction to MR. The very first thing a user does when wearing a high-fidelity headset is reach out and try to touch the holograms. This visual evidence was used to argue that hand tracking and physical interaction are non-negotiable for MR. * **Stretchable Capacitive Data Glove (Glauser et al., SIGGRAPH '19):** A fully soft, stretchable glove designed to overcome the line-of-sight limitations of headset cameras. It features 44 sensing elements made of laser-patterned carbon-black conductive silicone layered with non-conductive silicone. As the hand deforms, the area of the capacitors changes, altering capacitance. An Encoder-Decoder neural network maps this data to hand poses in real-time. Visual evidence showed the glove accurately tracking a hand reaching behind a potted plant and into a pocket, tasks impossible for optical tracking. * **Full-Body Pose Estimation via 6 IMUs (Huang et al., SIGGRAPH Asia '19):** A system using only 6 Inertial Measurement Units (IMUs) on the body (wrists, ankles, head, pelvis) to reconstruct full-body pose. Because IMUs only provide orientation and acceleration (underconstrained pose space), a Bidirectional Recurrent Neural Network (BiRNN) was used to exploit temporal consistency. Visual demonstrations showed accurate tracking of complex, occluded poses, such as a user sitting at a desk with their lower body hidden from external cameras. * **Learning Cooperative Personalized Policies from Gaze Data (Gebhardt et al., UIST '19):** A reinforcement learning system designed to reduce visual clutter in MR grocery shopping and visual search tasks. Instead of manual rules, the system used recorded human gaze behavior (ballistic eye movements between fixations) as the "environment." The agent was rewarded (+10) for showing a label right before a user looked at it, and penalized (-10) for showing a label unnecessarily. Visual evidence showed the system successfully hiding all labels except the specific wine bottles the user was actively scanning. * **Miniature Bistable Haptic Actuators (Pece et al., UIST '17):** A demonstration of a haptic glove equipped with miniature electromagnetic actuators. Because they are bistable, they consume no power to hold a state. The video demonstrated two modes: "Pulse mode" (1.5% speed) to simulate the feeling of texture as a hand brushed over virtual bamboo, and "Contact mode" where the actuators locked into place to simulate the sustained physical pressure of grasping a virtual coffee mug. ## 4. Actionable Takeaways (Implementation Rules) * **Rule 1: Eliminate Line-of-Sight Dependencies for Tracking** - **[Action]** -> **[Using wearable soft sensors or IMUs]** -> **[Resulting in robust tracking during physical occlusions]**. Do not rely solely on inside-out headset cameras for hand tracking, as they fail when users manipulate objects close to their body, in the dark, or behind physical obstacles. * **Rule 2: Normalize Sensor Data for Personalization** - **[Action]** -> **[Applying Min-Max normalization to raw sensor inputs before feeding them to a neural network]** -> **[Resulting in models that generalize across different users]**. When building wearable sensors (like data gloves), process the data to account for different hand sizes and fit, allowing a single trained model to work for multiple users without extensive recalibration. * **Rule 3: Use Temporal Consistency to Solve Underconstrained Data** - **[Action]** -> **[Utilizing Recurrent Neural Networks (RNNs) with sliding windows]** -> **[Resulting in accurate position tracking from orientation-only sensors]**. When working with IMUs that lack absolute positional data, leverage the time-series history of the movement to predict the most physically probable joint positions. * **Rule 4: Treat Human Behavior as the RL Environment** - **[Action]** -> **[Using recorded human gaze trajectories as the training environment for an RL agent]** -> **[Resulting in UI adaptation policies learned without manual labeling]**. To build smart UIs that show/hide information appropriately, train the system against natural human physiological responses rather than hard-coding spatial rules. * **Rule 5: Implement Bistable Actuators for Wearable Haptics** - **[Action]** -> **[Using latching permanent magnets]** -> **[Resulting in the ability to render permanent contact without draining battery power]**. For dense, wearable haptic arrays, use bistable mechanical designs so that power is only consumed during the transition state, preventing rapid battery drain and overheating. ## 5. Pitfalls & Limitations (Anti-Patterns) * **Pitfall:** Over-relying on "gorilla arm" gestures for MR interaction. -> **Why it fails:** Holding arms up in the air to interact with headset-tracked holograms causes rapid physical fatigue. -> **Warning sign:** Users drop their arms out of the camera's field of view, causing the system to lose tracking and breaking the interaction. * **Pitfall:** Plastering the MR field of view with permanent informational labels. -> **Why it fails:** Constant text overlays create severe visual clutter, overwhelming the user and hiding the physical environment, which is highly dangerous in real-world settings (e.g., walking near traffic). -> **Warning sign:** Users complain about not being able to see their surroundings, or the UI feels like a cluttered desktop screen rather than an integrated reality. * **Pitfall:** Delivering high visual fidelity without corresponding haptic feedback. -> **Why it fails:** It triggers the "Uncanny Valley" of interaction. When an object looks perfectly real but the user's hand passes right through it, it creates a jarring cognitive conflict. -> **Warning sign:** Users report the experience feels "creepy," "ghostly," or gives them "nightmares." * **Pitfall:** Using camera-based hand tracking for fine-grained physical object manipulation. -> **Why it fails:** When a user grasps a real object (like a mug or a tool), the object occludes the fingers from the headset's cameras, causing the tracking to instantly fail or jitter. -> **Warning sign:** The virtual hand model disappears or contorts wildly the moment the user touches a physical prop. ## 6. Key Quote / Core Insight "If you provide a visually realistic virtual object, the very first thing people do is reach out and try to touch it. If their hand passes right through, it creates a visceral, cognitive conflict—an uncanny valley of interaction—that can literally give people nightmares. We have to solve the physical interface, not just the display." ## 7. Additional Resources & References * **Resource:** "Stretch Sensing: Fully Soft Data Glove" (Glauser et al., SIGGRAPH 2019) - **Type:** Academic Paper - **Relevance:** Details the fabrication and machine learning pipeline for the stretchable capacitive data glove. * **Resource:** "Full-Body Pose Estimation from 6 IMUs" (Huang et al., SIGGRAPH Asia 2019) - **Type:** Academic Paper - **Relevance:** Explains the BiRNN architecture for solving underconstrained pose space using sparse wearable sensors. * **Resource:** "Context-Aware Online Adaptation for MR" (Gebhardt et al., UIST 2019) - **Type:** Academic Paper - **Relevance:** Details the reinforcement learning approach to adapting UI labels based on human gaze trajectories. * **Resource:** "Miniature Bistable Haptic Actuators" (Pece et al., UIST 2017) - **Type:** Academic Paper - **Relevance:** Explains the hardware design for low-power, high-density wearable haptic feedback. * **Resource:** "Holoportation" (Orts-Escolano et al.) - **Type:** Academic Paper/Project - **Relevance:** Cited as an example of state-of-the-art 3D surface reconstruction for capturing humans in real-time. * **Resource:** "ScanNet" (Dai et al.) - **Type:** Academic Paper/Dataset - **Relevance:** Cited as an example of state-of-the-art semantic scene understanding and 3D mapping.