Analysis methods in NLP: Probing
📂 General
# Analysis methods in NLP: Probing
**Video Category:** Natural Language Processing (NLP) Tutorial / Machine Learning Analysis
## ð 0. Video Metadata
**Video Title:** Analysis methods in NLP: Probing
**YouTube Channel:** Stanford Engineering
**Publication Date:** Not shown in video
**Video Duration:** ~11.5 minutes
## ð 1. Core Summary (TL;DR)
This video introduces "probing," a structural evaluation method used to analyze the hidden representations of complex NLP models like BERT to understand what linguistic features they latently encode. By training a small, supervised "probe" model on top of the frozen internal layers of a target model, researchers can test if concepts like part-of-speech or sentiment are structurally present inside the black box. However, the methodology comes with strict limitations: complex probes may simply learn the task from scratch rather than extracting existing knowledge, and probes cannot prove that the model actually utilizes the encoded information to make its final predictions. Understanding these constraints is essential for practitioners engaging in "BERTology" to accurately interpret model behavior.
## 2. Core Concepts & Frameworks
* **Concept:** Probing -> **Meaning:** The technique of using a small, supervised model (the probe) trained on the internal, hidden representations of a larger, frozen pre-trained model (the target model) to determine what specific information is latently encoded inside those layers. -> **Application:** Used in "BERTology" to analyze if a pre-trained language model has internally organized data to represent syntactic structures, sentiment, or entity types without being explicitly trained to do so.
* **Concept:** Control Task -> **Meaning:** A synthetic diagnostic task that shares the exact same input/output format and data distribution as the target probing task, but replaces the true labels with randomized, fixed assignments. -> **Application:** Used as a baseline to test the memorization capacity of a probe model; if a probe can easily solve the control task, it is likely too complex and is learning the data mapping rather than extracting latent features.
* **Concept:** Probe Selectivity -> **Meaning:** A metric defined as the difference between a probe's performance (accuracy) on the actual target task and its performance on the randomized control task. -> **Application:** Used to justify the chosen architecture of a probe; a reliable probe must demonstrate high selectivity, indicating it relies on the target model's latent structures rather than its own capacity to memorize random labels.
## 3. Evidence & Examples (Hyper-Specific Details)
* **[Generic Transformer Probing Setup / Visual Demonstration]:** The speaker shows a diagram of a generic transformer with an input sequence `[a, c, f, m, r, w, t]` passing through three layers. To probe for information, a specific hidden representation `h` from the middle layer is selected. The target model's parameters are frozen, and a `SmallLinearModel(h)` is trained to output a specific `task` label (e.g., predicting sentiment from that specific node). For sequence problems like part-of-speech tagging or Named Entity Recognition (NER), an entire layer of outputs (or multiple layers) is used as the basis for the probe model instead of a single node.
* **[Hewitt and Liang (2019) Control Task Design / Spoken & Visual Evidence]:** The video details three specific control tasks designed to test probe capacity:
* *Word-sense classification:* Words are assigned random, fixed "senses" regardless of context.
* *Part-of-speech (POS) tagging:* Words are assigned random, fixed POS tags drawn from the actual tag vocabulary.
* *Parsing:* Edges are assigned randomly using simple strategies to create pseudo-parses linking different word pairs.
* **[Probe Selectivity Chart (Hewitt and Liang 2019) / Visual Data]:** A chart maps "MLP Hidden Units (Complexity)" (x-axis: 2, 4, 10, 45, 1000) against "Accuracy" (y-axis: 0.30 to 0.90). The chart plots Task Accuracy (light blue line) and Control Task Accuracy (red line).
* At **2 hidden units**: Task accuracy is high (~0.85), and control accuracy is low (~0.40), resulting in high selectivity (large gap).
* At **1000 hidden units**: Task accuracy rises slightly, but control accuracy spikes to nearly match it. Selectivity approaches zero, proving the 1000-unit MLP probe is simply memorizing the task and not diagnosing the target model.
* **[No Causal Inference Limitation (Belinkov and Glass 2019; Vig et al. 2020) / Mathematical Proof]:** A visual model takes three integers ($x, y, z$) and aims to compute their sum.
* Hidden layer $L_1$ receives $x$ and $y$. A probe on $L_1$ perfectly computes $x+y$.
* Hidden layer $L_2$ receives $y$ and $z$. A probe on $L_2$ perfectly computes $z$.
* *The mathematical reality:* The actual model's final output is computed using a final weight vector $w = [0, 1, 0]^T$ applied to intermediate weights. The computation effectively zeroes out the contributions of the first and third positions, relying *only* on the middle state (which encodes $y$).
* *Result:* Even though $L_1$ perfectly encodes "$x+y$" and $L_2$ perfectly encodes "$z$", neither representation has *any* causal impact on the model's actual output. The probe found perfect latent information that the model completely ignores.
## 4. Actionable Takeaways (Implementation Rules)
* **Rule 1: Freeze the target model parameters** - When building a probe, you must freeze all parameters of the target model (e.g., BERT). The goal is to evaluate the existing representations as fixed inputs, not to fine-tune the model for a new task.
* **Rule 2: Restrict probe architecture complexity** - Always default to the simplest possible model for your probe, such as a small linear model. Using deep, high-capacity neural networks as probes will obscure whether the information was latently encoded in the target model or simply learned from scratch by the probe itself.
* **Rule 3: Always implement a randomized control task** - Never report probing results without simultaneously running the exact same probe architecture on a control task where the labels have been fixed randomly.
* **Rule 4: Optimize for probe selectivity, not just accuracy** - Evaluate the validity of your probe by calculating its selectivity (Task Accuracy minus Control Task Accuracy). Only accept findings from probes that maintain high selectivity.
* **Rule 5: Do not claim causality from probing results** - Treat probing results strictly as evidence of correlation and latent encoding. To determine if the model actually *uses* that encoded information for its outputs, you must transition to feature attribution methods or causal interventions.
## 5. Pitfalls & Limitations (Anti-Patterns)
* **Pitfall:** Deploying a deep Multi-Layer Perceptron (MLP) as a probe to maximize task accuracy. -> **Why it fails:** High-capacity models have enough parameters to learn complex mappings from raw input features to the target labels independently. They become successful supervised learners rather than analytical probes. -> **Warning sign:** The probe achieves near-perfect accuracy on the target task, but also achieves high accuracy on a control task with randomized, nonsensical labels.
* **Pitfall:** Concluding that a model's behavior is driven by a feature because a probe successfully found it in a hidden layer. -> **Why it fails:** Information can be perfectly encoded in an intermediate layer's representation but subsequently assigned a weight of zero by the downstream layers that compute the final prediction. -> **Warning sign:** You identify a strong syntactic feature via a probe, but when you intentionally corrupt that syntax in the input text, the model's final prediction does not change.
## 6. Key Quote / Core Insight
"Probes cannot tell us about whether the information that we identify has any causal relationship with the target model's behavior. It could be that the part of speech information is simply latently encoded, but not actually relevant to your model's input-output behavior."
## 7. Additional Resources & References
* **Resource:** Conneau et al. (2018) and Tenney et al. (2019) - **Type:** Paper - **Relevance:** Foundational texts for the core methodology of probing and its application in "BERTology".
* **Resource:** Hewitt and Liang (2019) - **Type:** Paper - **Relevance:** Introduced the crucial concepts of control tasks and probe selectivity to solve the problem of probe complexity.
* **Resource:** Belinkov and Glass (2019) and Vig et al. (2020) - **Type:** Paper - **Relevance:** Detailed the fundamental limitation that probes cannot establish causal inference.
* **Resource:** Saphra and Lopez (2019): "Singular Vector Canonical Correlation Analysis as a probing technique" - **Type:** Paper - **Relevance:** Explores unsupervised probing methods to avoid the pitfalls of supervised probe capacity.
* **Resource:** Clark et al. (2019) and Manning et al. (2020): "Inspecting attention weights" - **Type:** Paper - **Relevance:** Unsupervised probing via attention mechanism analysis.
* **Resource:** Hewitt and Manning (2019) and Chi et al. (2020) - **Type:** Paper - **Relevance:** Using linear transformations of hidden states to identify latent syntactic structures in BERT.
* **Resource:** Rogers et al. (2020): "A primer in BERTology: What we know about how BERT works" - **Type:** Paper - **Relevance:** Highly recommended comprehensive overview of probing efforts and findings regarding BERT representations.