Denoising Diffusion Probabilistic Model (DDPM)

DDPMs are a class of generative model where the output generation is modeled as a denoising process, often called Stochastic Langevin Dynamics.

Learning about these via Diffusion Policy paper

Oh this is just a standard diffusion model.

Resources

See Stable Diffusion for an implementation example?

At a high-level, it’s really only a 2-step process:

A fixed (or predefined) forward diffusion process $q$ that adds Gaussian noise
A learned reverse denoising diffusion process $p_{θ}$

1. Forward Diffusion Process in DDPM

In the forward diffusion process, for each timestep $t$ , we add unit Gaussian noise to the previous sample $x_{t - 1}$ to produce $x_{t}$ :

$x_{t} = 1 - β_{t} \cdot x_{t - 1} + β_{t} \cdot ϵ, ϵ \sim N (0, I)$

$I$ is the Identity Matrix

This is you are sampling $x_{t}$ from a normal distribution with the following mean and variance: $x_{t} \sim N (1 - β_{t} \cdot x_{t - 1}, β_{t} I)$

I'm confused as to why they use $\sqrt{1 - \beta_t}$

This is to ensure that the total variance remains 1 (review Sum of Gaussians):
$\text{Var}(x_t) &= \text{Var}(\sqrt{1 - \beta_t} \cdot x_{t-1}) + \text{Var}(\sqrt{\beta_t} \cdot \epsilon) \\ &= (1 - \beta_t) \cdot \text{Var}(x_{t-1}) + \beta_t \cdot \text{Var}(\epsilon) \\ &= (1 - \beta_t) I + \beta_t I = I \\ \end{align}$
This shows that using $1 - β_{t}$ ensures that $x_{t}$ remains unit gaussian.

As a condition probability $q (x_{t} ∣ x_{t - 1})$ , this is written as: $q (x_{t} ∣ x_{t - 1}) = N (x_{t}; 1 - β_{t} x_{t - 1}, β_{t} I)$

Variance schedule

$β_{t}$ does not have to beconstant at each time step, we actally define variance schedule, $0 < β_{1} < β_{2} < \dots < β_{T} < 1$ .

is a variance schedule really needed? Yes, it’s similar to ideas behind Learning Rate.

Can be linear, quadratic, cosine, etc. as we will see further (a bit like a learning rate schedule).

2. Denoising process

Now, let’s say we want to reverse the process. We know how $p (x_{t} ∣ x_{t - 1})$ is calculated. Is the reverse $p (x_{t - 1} ∣ x_{t})$ doable? Review your Bayes Rule:

$P (x_{t - 1} ∣ x_{t}) = \frac{P ( x _{t} ∣ x _{t - 1} ) \cdot P ( x _{t - 1} )}{P ( x _{t} )}$

The problem is that we don’t know $P (x_{t - 1})$ , or do we? That is the thing we want to predict

Okay, no problem, just slap a universal function approximator, i.e. a neural network!! $p_{θ} (x_{t - 1} ∣ x_{t}) = N (x_{t - 1}; μ_{θ} (x_{t}, t), Σ_{θ} (x_{t}, t))$

The neural net learns two parameters, $μ_{θ}$ and $σ_{θ}$

In the original paper

They only made the neural net learn $μ$ , and fixed $σ$ the variance.

3. Objective function

Okay, so how do we formulate objective function for the neural net to learn?

We’ll use U-Net.

🛠️ Steven Gong

Table of Contents

Denoising Diffusion Probabilistic Model (DDPM)

1. Forward Diffusion Process in DDPM

2. Denoising process

3. Objective function

Graph View

Backlinks