High-Resolution Image Synthesis with Latent Diffusion Models

By predicting in a latent space, we can reduce the computational load.

Two stage training:

Train an autoencoder to encode images in latent space
Train diffusion to predict in latent space

“Being likelihood-based models, they do not exhibit mode-collapse and training instabilities as GANs and, by heavily exploiting parameter sharing, they can model highly complex distributions of natural images without involving billions of parameters as in AR models”

🛠️ Steven Gong

High-Resolution Image Synthesis with Latent Diffusion Models

Graph View

Backlinks