Maximum Likelihood Estimation

Expectation Maximization (EM)

It is a soft clustering algorithm. Unlike K-Means Clustering which is hard clustering.

The motivation behind this is the analytical derivation for MLE of GMMs has no closed form solutions (involves computing derivative of Log Likelihood of sums of gaussians). Instead of trying to maximize this log-likelihood directly, EM introduces latent variables for cluster assignments.

We can equivalently describe a GMM, with variables :

The joint distribution of observed and hidden is

  • by chain rule?

The marginal probability of :

If the latent were observed, the complete-data log-likelihood would be

  • which is easy to optimize in closed form.

Since are hidden, EM replaces them with their expected values (responsibilities ), leading to the iterative EM updates. Then, that’s how we are able to use the equations below.

This lecture is really good:

Resources

You first assume there are a set of gaussians that the data is sampled from.

Basic EM Algorithm for Gaussian Mixture

  1. Initialization
    Assume N Gaussians with parameters . Initialize , and mixing weights randomly.

  2. E-step (Expectation)
    Compute the Posterior Probability (responsibility) that point belongs to component :

  3. M-step (Maximization)
    Update parameters using weighted averages:

Mixing weights (optional? they didn’t explain this in the video)

  1. Repeat
    Alternate E-step and M-step until convergence (parameters stabilize or log-likelihood stops improving).