CS Knowledge Hub

# Developing Design Spaces for Visualization: Methodologies and Case Studies **Video Category:** Human-Computer Interaction & Data Visualization ## ð 0. Video Metadata **Video Title:** Human-Computer Interaction Seminar: Developing Design Spaces for Visualization **YouTube Channel:** Stanford Center for Professional Development (scpd.stanford.edu) **Publication Date:** March 4, 2022 **Video Duration:** ~1 hour 14 minutes ## ð 1. Core Summary (TL;DR) This presentation outlines methodologies for creating and assessing "design spaces" (also known as taxonomies or typologies) to systematically structure solutions within visualization and Human-Computer Interaction (HCI). By identifying independent, cross-cutting axes of design choices, researchers and practitioners can map existing landscapes, identify unmet tooling needs, and systematically generate novel designs. The framework demonstrates how moving from unstructured collections of examples to formalized design spaces provides descriptive, evaluative, and generative power, ultimately translating academic analysis into actionable software tools. ## 2. Core Concepts & Frameworks * **Concept: Design Space (Taxonomy/Typology)** -> **Meaning:** The imposition of a systematic structure onto a set of possibilities for a specific design problem. It identifies independent, orthogonal variables (axes or dimensions) that capture the central choices a designer must make. -> **Application:** Used to describe differences among existing designs, systematically reason about solutions, and increase cognitive efficiency by grouping similar instances to facilitate reasoning about classes rather than individual instances (referencing Paul Ralph). * **Concept: Open Coding (Thematic Analysis)** -> **Meaning:** A qualitative research method where raw source material is systematically reviewed to generate a set of descriptive tags or categories. It is a bottom-up, iterative process of grouping, abstracting, and refining concepts. -> **Application:** Used as the primary mechanism to discover the dimensions of a design space by analyzing corpora of visualizations, software scripts, or research papers. * **Concept: Tripartite Assessment Framework** -> **Meaning:** A method adapted from Michel Beaudoin-Lafon for evaluating the utility of a design space based on three criteria: *Descriptive power* (ability to precisely describe and distinguish a significant range of existing examples), *Evaluative power* (ability to assess multiple design alternatives for a specific purpose), and *Generative power* (ability to help designers create new designs or authoring tools). -> **Application:** Applied to validate newly created taxonomies, ensuring they move beyond academic categorization to inform the development of practical tools like recommender systems or authoring software. ## 3. Evidence & Examples (Hyper-Specific Details) * **[Timelines Revisited Design Space]:** Authored by Matt Brehmer, Bongshin Lee, Benjamin Bach, and Nathalie Henry Riche. The team assembled a corpus of 145 timeline visualizations and tools. Through open coding, they established a 3-axis design space: Representation (Linear, Radial, Grid, Spiral, Arbitrary), Scale (Chronological, Relative, Logarithmic, Sequential, Sequential + Interim Duration), and Layout (Unified, Faceted, Segmented, Faceted + Segmented). Out of 100 possible theoretical combinations ($5 \times 5 \times 4$), they identified 20 as "viable" designs. They validated descriptive power against 118 additional timelines and demonstrated generative power by building the "Timeline Storyteller" software (timelinestoryteller.com), which was subsequently distributed as a Microsoft Power BI add-on. * **[Genomic Epidemiology Visualization Typology (GEVIT)]:** Authored with Anamaria Crisan and Jenn Gardy. Focused on the specific domain of GenEpi. The team used text mining on 18,000 articles, sampled down to ~200 articles, and extracted 800 figures. Open coding revealed three axes: Chart Type (e.g., Phylogenetic Tree, Category Stripe), Chart Combination (e.g., Spatially Aligned), and Chart Enhancement (e.g., adding connection marks). The study revealed that over 80% of figures used enhancements and highlighted a shortfall: researchers heavily overused text to show relationships because existing tools lacked capabilities for complex visual combinations. This led to the creation of "GEVITRec", an automatic recommender system. * **[Multi-Table Data Wrangling in Computational Journalism]:** Authored with Steve Kasica and Charles Berret. A technical observation study to understand data wrangling practices. The team mined GitHub and Observable to find 1,301 journalists' repositories, curated down to 225, and performed deep qualitative coding on scripts from 50 repos. By mapping complex "data-flow sketches", they developed a taxonomy based on Object Type (Table, Row, Column) versus Input/Output Cardinality (Create $0 \rightarrow 1$, Delete $1 \rightarrow 0$, Transform $1 \rightarrow 1$, Separate $1 \rightarrow n$, Combine $n \rightarrow 1$). This bottom-up taxonomy was cross-checked against 30 previous classification systems. The analysis revealed a critical tooling gap: journalists frequently performed highly complex multi-table joins that were entirely unsupported by interactive wrangling tools like Trifacta or Tableau Prep. * **[Multi-Level Typology of Abstract Visualization Tasks]:** Authored with Matt Brehmer. Developed to bridge the gap between low-level interactions (e.g., "retrieve value") and high-level cognitive goals (e.g., "integrate insight"). The team used reflective synthesis and open coding of literature (design study papers) rather than empirical user studies. The resulting design space categorizes tasks into three axes: *Why* (Consume/Produce, Search, Query), *What* (Input/Output data types), and *How* (Encode, Manipulate, Facet, Reduce). This comprehensive taxonomy was adopted widely across the visualization community and forms the backbone of Munzner's "Visualization Analysis and Design" textbook. ## 4. Actionable Takeaways (Implementation Rules) * **Rule 1: Build design spaces bottom-up via open coding** - Assemble a large, representative corpus of actual artifacts (e.g., 145 timeline tools, 800 published figures, or 50 software scripts) and iteratively generate tags based on observed properties, rather than imposing top-down theoretical categories. * **Rule 2: Filter theoretical combinations for practical viability** - Once independent axes are defined, multiply the dimensions to map the total theoretical space (e.g., $5 \times 5 \times 4 = 100$ combinations). Systematically evaluate each combination against criteria like purposefulness and interpretability to identify the subset of truly viable designs (e.g., 20 viable timeline configurations). * **Rule 3: Validate descriptive power against a fresh dataset** - Prove the utility of a taxonomy by taking a hold-out test set (e.g., 118 newly gathered timelines) and verifying that the design space can accurately classify and describe every instance without requiring new categories. * **Rule 4: Leverage taxonomies to identify software tooling gaps** - Map user workflows identified through qualitative coding against the capabilities of current software. Where user practices (like journalists' multi-table data flows) diverge from tool support, utilize that gap to design new, targeted authoring or recommender systems. ## 5. Pitfalls & Limitations (Anti-Patterns) * **Pitfall:** Developing taxonomies with solely descriptive capabilities. -> **Why it fails:** A taxonomy that only categorizes existing work functions as an academic exercise but fails to drive innovation or tool creation. -> **Warning sign:** The framework cannot be used to generate novel designs or serve as the architecture for an authoring tool or automated recommender system. * **Pitfall:** Defining tasks at a single, incorrect level of abstraction. -> **Why it fails:** Describing user activities only as low-level interactions ("click here") strips away user intent, while describing them only as high-level goals ("understand data") is too vague to inform interface design. -> **Warning sign:** Inability to articulate both the *why* (the cognitive goal) and the *how* (the specific visual encoding or interaction technique) of a user's workflow. * **Pitfall:** Relying exclusively on finalized outputs for analysis. -> **Why it fails:** Analyzing only finished visualizations (like published charts) obscures the complex, messy processes required to produce them. -> **Warning sign:** Missing critical bottlenecks in the user's workflow, such as the massive amount of "hidden" data wrangling required before visualization can occur, as revealed by analyzing journalists' raw code repositories rather than their final articles. ## 6. Key Quote / Core Insight A well-structured design space acts as a cognitive accelerator, allowing designers to move beyond point-by-point analysis of individual examples and instead reason systematically about entire classes of solutions, ultimately translating abstract categorization into generative software tools. ## 7. Additional Resources & References * **Resource:** Timeline Storyteller (timelinestoryteller.com) - **Type:** Tool - **Relevance:** An open-source authoring tool demonstrating the generative power of the timeline design space, also available as a Microsoft Power BI add-on. * **Resource:** Visualization Analysis and Design by Tamara Munzner - **Type:** Book - **Relevance:** A comprehensive textbook built upon the abstract task typologies and design spaces discussed in the presentation. * **Resource:** Designing the User Interface by Michel Beaudoin-Lafon (2004) - **Type:** Paper - **Relevance:** Source of the framework for assessing descriptive, evaluative, and generative power. * **Resource:** Toward Methodological Guidelines for Process Theories & Taxonomies in Software Engineering by Paul Ralph (IEEE TSE 2020) - **Type:** Paper - **Relevance:** Provides the methodological distinction between taxonomies (theories for understanding) and process theories (theories for how things happen).