Evidence Lower Bound (ELBO)
Used in VAE.
Uses multiple concepts for the derivation:
Why ELBO?
ELBO is useful because it provides a guarantee on the worst-case for the log-likelihood of some distribution (e.g. ) which models a set of data.
Deriving the ELBO
Start with the intractable log-likelihood:
Insert any distribution :
Interpret the integral as an expectation:
Apply Jensen’s Inequality ():
Define the ELBO:
Expand :
The last two terms form a KL divergence:
Final ELBO expression:
**In VAEs, we use .
Why is this a lower bound?
There is an exact identity:
Since the KL divergence is always nonnegative:
we get:
The gap is exactly:
Interpretation:
- ELBO is never larger than the true log-likelihood.
- Maximizing ELBO makes approach the true posterior.
- As improves, the KL gap shrinks.
- If , the ELBO is exactly equal to .
Thus ELBO is the best computable surrogate for the true log-likelihood.