Markov Networks: Gibbs Sampling

📂 General
# Markov Networks: Gibbs Sampling **Video Category:** Machine Learning / Probabilistic Graphical Models Tutorial ## 📋 0. Video Metadata **Video Title:** Markov networks: Gibbs sampling **YouTube Channel:** Stanford Engineering **Publication Date:** Not shown in video **Video Duration:** ~18 minutes ## 📝 1. Core Summary (TL;DR) Computing exact marginal probabilities in complex Markov networks requires iterating over all possible assignments, which takes exponential time and is often intractable. Gibbs sampling solves this problem by using a randomized local search approach, iteratively updating one variable at a time based on the conditional probabilities of its neighbors. By counting how frequently each variable visits a specific state during this random exploration, the algorithm generates highly accurate approximate estimates of the true marginal probabilities. ## 2. Core Concepts & Frameworks * **Markov Network:** -> **Meaning:** A probabilistic graphical model based on a factor graph that defines a joint probability distribution over a set of random variables $X = (X_1, \dots, X_n)$. The probability of an assignment is proportional to the product of local weights, normalized by a constant $Z$. -> **Application:** Used for modeling complex dependencies in fields like object tracking or computer vision. * **Marginal Probability:** -> **Meaning:** The probability distribution of a specific, single variable (e.g., $X_i$) taking on a specific value, independent of the exact state of all other variables. It is calculated by summing the joint probabilities of all complete assignments where $X_i$ equals that value. -> **Application:** Answering specific queries like "What is the probability the object is at location 1?" without needing to know the exact locations of all other objects. * **Gibbs Sampling:** -> **Meaning:** A randomized algorithm for approximating marginal probabilities. It repeatedly selects a single variable, holds all other variables fixed, calculates the probability distribution for that single variable based on local factor weights, and samples a new value from that distribution. -> **Application:** Approximating intractably large summations in probabilistic models, such as cleaning up noisy image pixels. * **Iterated Conditional Modes (ICM):** -> **Meaning:** A deterministic search algorithm used for Constraint Satisfaction Problems (CSPs) that also updates one variable at a time by choosing the value that yields the *maximum* weight. -> **Application:** Used to find a single, locally optimal assignment, contrasting with Gibbs sampling which explores the space to find probability distributions. ## 3. Evidence & Examples (Hyper-Specific Details) * **Object Tracking Toy Example:** The speaker demonstrates a 3-variable network ($X_1, X_2, X_3$). To compute the exact marginal $\mathbb{P}(X_2=1)$, the system sums the probabilities of all rows where $X_2=1$, yielding exactly 0.62. To approximate this using Gibbs sampling, the algorithm holds $X_1$ and $X_3$ fixed and evaluates $X_2$. If the factor weights for $X_2=\{0,1,2\}$ are 1, 2, and 2 respectively, they sum to 5. The algorithm normalizes these into probabilities ($P=0.2, P=0.4, P=0.4$) and "throws a dart" on a 0-to-1 interval to randomly select the new state. * **Object Tracking Live Demo:** A browser-based demo runs Gibbs sampling on the tracking query $\mathbb{P}(X_2)$. As the algorithm loops through variables, it increments a counter for the assigned state. After 1000 iterations, the estimated marginal probability for $X_2=1$ converges to 0.61 (compared to the exact 0.62), and $X_2=2$ converges to 0.38. * **Image Denoising Application:** The goal is to clean a noisy image where variables $X_i \in \{0, 1\}$ represent pixel values (black or red). Factors dictate that neighboring pixels are twice as likely to be the same color than different ($t_{ij} = [x_i = x_j] + 1$). For a specific pixel, if setting it to 0 yields a local weight product of 2, and setting it to 1 yields a weight product of 8, the sum is 10. Gibbs sampling then assigns the pixel to 0 with a 20% probability and 1 with an 80% probability. * **Image Denoising Live Demo:** The visual demo sweeps across a grid of pixels. When `showMarginals = false`, the pixels rapidly flicker between absolute black and red assignments as they are randomly sampled. When `showMarginals = true`, the visualization averages the visitations, displaying continuous shades between black and red representing the network's probabilistic "best guess" for the clean image. * **Denoising Parameter Tuning:** Changing the `missingFrac` to 0.3 (fewer missing pixels) results in a much faster, cleaner reconstruction. Setting `missingFrac = 1.0` (completely unobserved) results in blind sampling from the prior. Bumping the `coherenceFactor` from 2 up to 10 forces the model to strongly prefer similarity, resulting in a phase transition where the image groups into large, solid blocks of color. ## 4. Actionable Takeaways (Implementation Rules) * **Rule 1: Convert local weights to probabilities** -> In Gibbs sampling, do not greedily pick the maximum weight assignment. Calculate the weights for all possible states of the target variable, sum them, and divide each by the sum to create a normalized probability distribution -> **Result:** This ensures the algorithm explores the state space proportionally rather than getting stuck in a local optimum. * **Rule 2: Count visitations to estimate marginals** -> Maintain an array of counters for every variable. Each time a variable is sampled and assigned a value, increment the counter for that specific value -> **Result:** Dividing a value's count by the total iterations provides the estimated marginal probability ($\hat{\mathbb{P}}$). * **Rule 3: Visualize the marginals, not the raw samples** -> When dealing with visual data like images, plotting the raw binary samples will look like flickering noise. Render the continuous probability estimate (the running average) instead -> **Result:** This provides a stable, highly readable interpretation of the model's underlying confidence. * **Rule 4: Ensure all network weights are positive** -> To guarantee that Gibbs sampling will eventually converge to the true mathematical marginals, design your factors so that no assignment has a weight of exactly zero -> **Result:** This satisfies the technical condition of ergodicity, ensuring the sampler can physically traverse the entire possibility space without getting trapped by zero-probability walls. ## 5. Pitfalls & Limitations (Anti-Patterns) * **Pitfall:** Including zero-weight factors (hard deterministic constraints) in the model. -> **Why it fails:** Zero weights act as impenetrable walls in the state space. If the state space becomes partitioned, the local sampler cannot travel between disconnected regions, breaking the core assumption of the algorithm. -> **Warning sign:** The marginal estimates converge to wildly incorrect values because entire valid sections of the possibility space are never visited. * **Pitfall:** Assuming the algorithm will converge quickly on complex problems. -> **Why it fails:** While Gibbs sampling provides asymptotic guarantees of correctness, its "mixing time" (the time required to reach the true distribution) can be exponentially slow depending on the complexity and local dependencies of the network. -> **Warning sign:** The estimated probabilities remain highly volatile or stuck in localized patterns even after millions of iterations. ## 6. Key Quote / Core Insight "It seems at first glance kind of a wild thing... we're running this algorithm, it's generating samples left and right, it's completely random. And yet, the randomness is very carefully orchestrated so that when I sum things up properly, I actually get the true right answer out." ## 7. Additional Resources & References * **Resource:** Iterated Conditional Modes (ICM) - **Type:** Algorithm - **Relevance:** Serves as the deterministic, optimization-focused counterpart to Gibbs sampling. * **Resource:** Markov chain Monte Carlo (MCMC) - **Type:** Algorithmic Framework - **Relevance:** The broader, powerful toolkit of randomized procedures for estimating probabilities, of which Gibbs sampling is a specific, foundational implementation.