Evidence Lower Bound (ELBO)

Used in VAE.

Uses multiple concepts for the derivation:

Why ELBO?

ELBO is useful because it provides a guarantee on the worst-case for the log-likelihood of some distribution (e.g. ) which models a set of data.

Deriving the ELBO

Start with the intractable log-likelihood:

Insert any distribution :

Interpret the integral as an expectation:

Apply Jensen’s Inequality ():

Define the ELBO:

Expand :

The last two terms form a KL divergence:

Final ELBO expression:

**In VAEs, we use .

Why is this a lower bound?

There is an exact identity:

Since the KL divergence is always nonnegative:

we get:

The gap is exactly:

Interpretation:

  • ELBO is never larger than the true log-likelihood.
  • Maximizing ELBO makes approach the true posterior.
  • As improves, the KL gap shrinks.
  • If , the ELBO is exactly equal to .

Thus ELBO is the best computable surrogate for the true log-likelihood.