CS Knowledge Hub

# Conducting Usable Privacy and Security Studies: It's Complicated! **Video Category:** Human-Computer Interaction / Security Research ## ð 0. Video Metadata **Video Title:** Conducting Usable Privacy and Security Studies: It's Complicated **YouTube Channel:** Stanford Center for Professional Development **Publication Date:** November 20, 2015 **Video Duration:** ~1 hour 11 minutes ## ð 1. Core Summary (TL;DR) This presentation explores the complexities of conducting user studies at the intersection of human-computer interaction (HCI) and computer security/privacy. It demonstrates that assuming users will act rationally or understand security interfaces is fundamentally flawed, and relying on expert intuition leads to ineffective designs. By utilizing clever deception, realistic scenarios, and controlled experiments, researchers can uncover the root causes of security failures and design warnings, nudges, and authentication methods that align with human behavior and cognitive limitations. ## 2. Core Concepts & Frameworks * **Concept:** Ecological Validity -> **Meaning:** The extent to which the environment and conditions of a user study mirror the real-world situations where the system will actually be used. -> **Application:** Creating deceptive lab scenarios (e.g., buying paper clips on Amazon and sending a fake phishing email) so participants behave naturally without knowing they are being tested on security awareness. * **Concept:** Active vs. Passive Warnings -> **Meaning:** Passive warnings are easily ignored "swat-away" dialogs, while active warnings force user engagement or cognitive processing before allowing them to proceed. -> **Application:** Forcing a user to type a specific publisher name or wait for a 10-second timer before proceeding with an action to ensure they actually process the risk. * **Concept:** Privacy Premium -> **Meaning:** The additional amount of money a consumer is willing to pay to purchase an item from a vendor with better privacy practices. -> **Application:** Measuring if users will buy a privacy-sensitive item (a sex toy) from a slightly more expensive site if a search engine indicator explicitly shows it has better privacy policies. * **Concept:** System-Assigned Passphrases -> **Meaning:** A password strategy where the system generates a random combination of words (e.g., "correct horse battery staple") rather than relying on the user to pick a phrase, to ensure high entropy. -> **Application:** Comparing the memorability and typing accuracy of 4 random common words against random characters and pronounceable passwords. * **Concept:** Security Nudges -> **Meaning:** Interface design elements that do not block an action but gently encourage users to stop, think, and potentially alter their behavior to avoid security or privacy regrets. -> **Application:** Showing Facebook users profile pictures of who will see their post or enforcing a 10-second countdown before a post goes live to prevent impulsive sharing. ## 3. Evidence & Examples (Hyper-Specific Details) * **[Deceptive Usability Testing / EROS Window System]:** A 2004 USENIX Security paper on the EROS Trusted Window System claimed usability testing was done by having Wesley Vanderburg, age 4, draw a stick figure in Microsoft Paint. This highlights the absurdity of non-HCI experts making usability claims. * **[Privacy Bird Icon Misinterpretation]:** In a 2002 study of the "Privacy Bird" browser plugin, participants misinterpreted a green bird icon as meaning "websites where you can download music" and a red bird icon as "sites you don't want to let your kids go to," rather than their intended meanings of privacy policy matching. * **[2007 Phishing Warnings Study / Deception]:** Participants were paid to buy paper clips on Amazon.com using their own accounts. During the study, they were sent a fake Amazon email ("amazonaccounts.net") claiming their order was delayed. If they clicked the link, a browser phishing warning appeared. The study found most users ignored the warnings, often closing the browser and clicking the link again repeatedly (up to 7 or 8 times), mistakenly believing the website itself was just temporarily broken. * **[2008 SSL Certificate Warning Study / Context Dependency]:** To test if users could distinguish between safe and dangerous SSL warnings, researchers tested two contexts. Non-risky context: Accessing the CMU "Cameo" library catalog (which naturally threw a self-signed certificate warning). Risky context: Logging into their real online bank account (PNC Bank) to check their balance while a man-in-the-middle attack was simulated by removing the root certificate from the browser. In Firefox 3, users could not figure out the 4-step process to bypass the warning, completely blocking them from safe sites like the library. * **[Forced Interaction Warnings / Mechanical Turk]:** To find a warning users couldn't ignore, a study with 2,227 Mechanical Turk workers had them play online games (e.g., "Mars Buggy Online", "Tom and Jerry Refrigerator Raid", "Colliderix Level Pack"). Game 3 required "installing software," triggering fake Windows Security warnings. The most effective warning required users to re-type the exact publisher name (e.g., "Miicr0s0ft Corporation") to proceed, significantly slowing them down and reducing installation rates in suspicious contexts. Swiping the mouse over the name failed because users swiped without reading. * **[Willingness to Pay for Privacy / Search Engine Indicators]:** A study had 72 Pittsburgh residents buy items using a custom search engine ("Privacy Finder") with privacy meters ranging from red to green. For a benign item (AA Duracell batteries), users bought the cheapest option regardless of privacy. For a privacy-sensitive item (Pocket Rocket Jr. sex toy), when privacy information was present, half the users paid a "privacy premium" (up to 69 cents more) to buy from a site with a better privacy score. Researchers paid participants a fixed amount and let them keep the change to simulate true cost. * **[Facebook Regrets / Privacy Nudges]:** A 6-week study with 28 participants tested Chrome plugins on Facebook. The "Timer Nudge" held posts for 10 seconds before publishing, with a countdown clock and a cancel button. The "Profile Picture Nudge" showed 5 random profile pictures of people who could see the post (e.g., friends, boss, mother). The "Sentiment Nudge" flagged negative language. The Picture and Timer nudges successfully caused users to edit or cancel impulsive posts, while the Sentiment nudge mostly annoyed people. * **[XKCD Passphrase Assertion Study]:** A 1,476-participant Mechanical Turk study tested the XKCD 936 comic's claim that 4 random words are easier to remember and just as secure as random characters. The study tested 4 common words (e.g., "try there three come"), "noun-verb-adjective-noun" structures (e.g., "plan builds sure power"), 5 random characters, and pronounceable passwords (e.g., "tufritvi", "vadasabi"). Participants returned 2 days later to recall them. The results contradicted XKCD: passphrases were NOT easier to remember, resulted in more typing mistakes, and took longer to enter. Pronounceable passwords performed best for speed and accuracy. ## 4. Actionable Takeaways (Implementation Rules) * **Rule 1: Use deception to achieve ecological validity.** - When testing security warnings, do not tell participants you are testing security. Give them a realistic primary task (like shopping or playing an online game) and trigger the security event unexpectedly to observe genuine reactions. Use easy alternative tasks (like providing a phone number alongside a URL) to ensure users aren't just pushing through warnings to finish the study. * **Rule 2: Force cognitive interaction to bypass habituation.** - If a warning is critical, do not use a standard "OK/Cancel" dialog or simple swiping mechanics. Force the user to perform a cognitive task, such as typing the exact name of a software publisher (including intentional misspellings like "Miicr0s0ft"), to ensure they have read the relevant trust indicator. * **Rule 3: Evaluate warnings in both risky and non-risky contexts.** - A warning is only successful if it stops users in a dangerous situation (e.g., banking man-in-the-middle) BUT allows them to easily proceed in a known safe situation (e.g., a university library self-signed certificate). * **Rule 4: Design visual attractors to highlight key trust decisions.** - Do not bury critical decision-making information (like the actual URL or publisher name) in a wall of text. Use distinct colors (ANSI warning colors) or structural changes (slow reveals, animated connectors) to draw the eye directly to the entity the user needs to evaluate. * **Rule 5: Use contextual nudges rather than hard blocks for social media.** - To prevent user regret, implement friction points like a 10-second publishing delay or visual reminders of the audience (e.g., displaying random profile pictures) before a post goes live, giving the user a chance to self-censor impulsive thoughts. * **Rule 6: Do not rely on system-assigned random word passphrases.** - Despite popular belief, random word combinations are harder for users to type without errors and take longer to input. If system-assigned passwords are required for high entropy, utilize pronounceable nonsense words (e.g., "tufritvi") to balance security with typing usability. ## 5. Pitfalls & Limitations (Anti-Patterns) * **Pitfall:** Assuming lay users share the mental models of security experts. -> **Why it fails:** Experts look for specific indicators (publisher names, URLs), while lay users interpret warnings as generic system errors or broken websites. -> **Warning sign:** Users repeatedly closing the browser and clicking the same malicious link expecting a different result. * **Pitfall:** Testing security UI using non-representative subjects (kids or coworkers). -> **Why it fails:** People highly exposed to technology are not representative of the general public. A 2-year-old child will navigate systems completely differently than a typical user. -> **Warning sign:** Evaluating a system's usability based on whether your own child can operate it. * **Pitfall:** Using hypothetical "willingness to pay" surveys for privacy. -> **Why it fails:** Users will state they value privacy in a survey because it costs them nothing. When forced to spend real money, their behavior changes. -> **Warning sign:** Conducting surveys on privacy preferences without attaching real financial stakes to the decisions. * **Pitfall:** Reimbursing study participants a flat amount regardless of their shopping choices. -> **Why it fails:** If the researcher covers the cost of any item, the participant has no incentive to save money, nullifying any realistic price-vs-privacy trade-off. -> **Warning sign:** Participants always buying the highest-priced, highest-privacy item because it's effectively "free." * **Pitfall:** Designing warnings that can be dismissed with a mouse swipe. -> **Why it fails:** Users build muscle memory and will quickly learn to perform physical actions without reading the text. -> **Warning sign:** Warning dismissal rates drop initially but return to baseline as users habituate to the physical swiping motion. * **Pitfall:** Relying on system-assigned passphrases for better memorability. -> **Why it fails:** Random words lack mnemonic connections, making them no easier to remember than random characters, while significantly increasing typing time and error rates. -> **Warning sign:** High failure rates and slow input times when users are forced to use system-generated passphrases. ## 6. Key Quote / Core Insight "The model that people had was: 'There's something wrong with the website. The web browser is just giving me some warning. When web browsers give me warnings, it's usually because something is wrong with the website. And usually, if you just keep trying again, eventually it fixes itself and it will be fine.' They had no idea that it was something dangerous." ## 7. Additional Resources & References * **Resource:** Privacy Bird - **Type:** Browser Plugin - **Relevance:** An early 2002 tool used to evaluate privacy policies, demonstrating how users misinterpreted red/green bird icons. * **Resource:** EROS Trusted Window System (USENIX Security 2004) - **Type:** Paper - **Relevance:** Cited as an anti-pattern for usability testing, as the authors used a 4-year-old drawing a stick figure as their usability evaluation. * **Resource:** Amazon Mechanical Turk - **Type:** Crowdsourcing Platform - **Relevance:** Used extensively by the researchers to run massive, inexpensive remote usability studies with deceptive scenarios. * **Resource:** Privacy Finder - **Type:** Search Engine Tool - **Relevance:** A custom search engine built by CMU that overlaid privacy score "meters" onto search results to test users' willingness to pay a privacy premium. * **Resource:** XKCD Comic #936 ("Password Strength") - **Type:** Comic/Concept - **Relevance:** The comic claiming "correct horse battery staple" is highly secure and easy to remember, which the CMU team empirically tested and contradicted. * **Resource:** Symposium on Usable Privacy and Security (SOUPS) - **Type:** Conference - **Relevance:** Mentioned as the premier venue for publishing and finding more research on this specific topic.