Evidence Lower Bound (ELBO)

Used in VAE.

ELBO (q) = E_{q (z)} [lo g p_{θ} (x ∣ z)] - KL (q (z) ∥ p (z))

Uses multiple concepts for the derivation:

Why ELBO?

ELBO is useful because it provides a guarantee on the worst-case for the log-likelihood of some distribution (e.g. $p (X)$ ) which models a set of data.

Deriving the ELBO

Start with the intractable log-likelihood:

lo g p_{θ} (x) = lo g \int p_{θ} (x, z) d z

Insert any distribution $q (z)$ :

lo g p_{θ} (x) = lo g \int q (z) \frac{p _{θ} ( x , z )}{q ( z )} d z

Interpret the integral as an expectation:

lo g p_{θ} (x) = lo g E_{z \sim q (z)} [\frac{p _{θ} ( x , z )}{q ( z )}]

Apply Jensen’s Inequality ( $lo g E [Y] \geq E [lo g Y]$ ):

lo g p_{θ} (x) \geq E_{z \sim q (z)} [lo g \frac{p _{θ} ( x , z )}{q ( z )}]

Define the ELBO:

ELBO (q) = E_{q (z)} [lo g p_{θ} (x, z)] - E_{q (z)} [lo g q (z)]

Expand $p_{θ} (x, z) = p_{θ} (x ∣ z) p (z)$ :

ELBO (q) = E_{q (z)} [lo g p_{θ} (x ∣ z)] + E_{q (z)} [lo g p (z) - lo g q (z)]

The last two terms form a KL divergence:

E_{q (z)} [lo g p (z) - lo g q (z)] = - KL (q (z) ∥ p (z))

Final ELBO expression:

ELBO (q) = E_{q (z)} [lo g p_{θ} (x ∣ z)] - KL (q (z) ∥ p (z))

**In VAEs, we use $q (z) = q_{ϕ} (z ∣ x)$ .

Why is this a lower bound?

There is an exact identity:

lo g p_{θ} (x) = ELBO (q) + KL (q (z) ∥ p_{θ} (z ∣ x))

Since the KL divergence is always nonnegative:

KL (q (z) ∥ p_{θ} (z ∣ x)) \geq 0

we get:

lo g p_{θ} (x) \geq ELBO (q)

The gap is exactly:

lo g p_{θ} (x) - ELBO (q) = KL (q (z) ∥ p_{θ} (z ∣ x))

Interpretation:

ELBO is never larger than the true log-likelihood.
Maximizing ELBO makes $q (z ∣ x)$ approach the true posterior.
As $q (z ∣ x)$ improves, the KL gap shrinks.
If $q (z ∣ x) = p_{θ} (z ∣ x)$ , the ELBO is exactly equal to $lo g p_{θ} (x)$ .

Thus ELBO is the best computable surrogate for the true log-likelihood.

🛠️ Steven Gong

Table of Contents

Evidence Lower Bound (ELBO)

Deriving the ELBO

Why is this a lower bound?

Interpretation:

Graph View

Backlinks