Expectation Maximization (EM)

It is a soft clustering algorithm. Unlike K-Means Clustering which is hard clustering.

The motivation behind this is the analytical derivation for MLE of GMMs has no closed form solutions (involves computing derivative of Log Likelihood of sums of gaussians). Instead of trying to maximize this log-likelihood directly, EM introduces latent variables for cluster assignments.

We can equivalently describe a GMM, with variables :

z_{i} \sim Categorical (π_{1}, \dots, π_{K}), x_{i} ∣ z_{i} = k \sim N (μ_{k}, σ_{k}^{2}) .

The joint distribution of observed $x_{i}$ and hidden $z_{i}$ is

p (x_{i}, z_{i} ∣ θ) = π_{z_{i}} N (x_{i} ∣ μ_{z_{i}}, σ_{z_{i}}^{2}) .

by chain rule?

The marginal probability of $p (x_{i} ∣ θ)$ :

p (x_{i} ∣ θ) = k = 1 \sum K p (x_{i}, z_{i} = k ∣ θ) = k = 1 \sum K π_{k} N (x_{i} ∣ μ_{k}, σ_{k}^{2}) .

If the latent $z_{i}$ were observed, the complete-data log-likelihood would be

ℓ_{c} (θ) = i = 1 \sum n k = 1 \sum K z_{ik} [lo g π_{k} + lo g N (x_{i} ∣ μ_{k}, σ_{k}^{2})],

which is easy to optimize in closed form.

Since $z_{i}$ are hidden, EM replaces them with their expected values (responsibilities $γ_{ik}$ ), leading to the iterative EM updates. Then, that’s how we are able to use the equations below.

This lecture is really good:

Full playlist: https://www.youtube.com/playlist?list=PLBv09BD7ez_4e9LtmK626Evn1ion6ynrt

Resources

You first assume there are a set of gaussians that the data $x$ is sampled from.

Basic EM Algorithm for Gaussian Mixture

Initialization
Assume N Gaussians with parameters $θ_{i} = (μ_{i}, σ_{i}^{2})$ . Initialize $μ_{i}, σ_{i}^{2}$ , and mixing weights $π_{i}$ randomly.
E-step (Expectation)
Compute the Posterior Probability (responsibility) that point $x_{j}$ belongs to component $i$ :
$γ_{ij} = P (θ_{i} ∣ x_{j}) = \frac{π _{i} N ( x _{j} ∣ μ _{i} , σ _{i}^{2} )}{\sum _{k = 1}^{2} π _{k} N ( x _{j} ∣ μ _{k} , σ _{k}^{2} )}$
M-step (Maximization)
Update parameters using weighted averages:

μ_{i} = \frac{\sum _{j} γ _{ij} x _{j}}{\sum _{j} γ _{ij}}

σ_{i}^{2} = \frac{\sum _{j} γ _{ij} ( x _{j} - μ _{i} ) ^{2}}{\sum _{j} γ _{ij}}

Mixing weights (optional? they didn’t explain this in the video)

π_{i} = \frac{1}{N} j \sum γ_{ij}

Repeat
Alternate E-step and M-step until convergence (parameters stabilize or log-likelihood stops improving).

Variational Inference

🛠️ Steven Gong

Table of Contents

Expectation Maximization (EM)

Basic EM Algorithm for Gaussian Mixture

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

Expectation Maximization (EM)

Basic EM Algorithm for Gaussian Mixture

Related

Graph View

Backlinks