High-Resolution Image Synthesis with Latent Diffusion Models
By predicting in a latent space, we can reduce the computational load.
Two stage training:
- Train an autoencoder to encode images in latent space
- Train diffusion to predict in latent space
“Being likelihood-based models, they do not exhibit mode-collapse and training instabilities as GANs and, by heavily exploiting parameter sharing, they can model highly complex distributions of natural images without involving billions of parameters as in AR models”