Generative Model
You have GANs and Diffusion Model that can generate data. Thereβs also GPT-3.
Types of generative models (source):
- Likelihood-based models: approximate the probability distribution . Ex:
- Implicit / Score-Based Models: do not model explicitly, or use alternative objectives to generate samples. Ex:
- Generative Adversarial Network (GAN):
- Diffusion Models: .
- Energy-Based Model (EBM):
I still don't fully get the difference....?
Itβs not about starting from noise and then denoisinig.
What is
p(x)?
- is the probability density (or mass) of a data point under your model.
- It tells you how likely your model thinks that is.
Does this matter anymore?
Like just slap a transformer and feed data, does this really matter? The architecture is becoming standardized (transformers), but the generative modeling paradigm β diffusion vs autoregressive vs GAN β still shapes what the model does and how it learns.
https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ https://yang-song.net/blog/2021/score/

- From lilian wagβs blog, it seems that all of these are really similar.
Taxonomy (CS231n 2025 Lec 13)
CS231n uses the Goodfellow 2017 tree β splits by how the model relates to the density , with the normalization constraint implying different values of compete for probability mass:
Generative models
/ \
Explicit density Implicit density
(model computes (can only sample
p(x)) from p(x))
/ \ / \
Tractable Approximate Direct Indirect
β β β β
Autoregressive VAE GAN Diffusion
- Tractable β actually evaluate (e.g. Autoregressive via chain rule).
- Approximate β canβt evaluate exactly but bound/approximate it (VAE: maximize ELBO instead of ).
- Direct implicit β single-shot sample from a noise vector (GAN generator).
- Indirect implicit β iterative sampling procedure (diffusion: denoise times).
Discriminative vs generative vs conditional-generative:
| Models | Used for | |
|---|---|---|
| Discriminative | classification | |
| Generative | density / sampling / anomaly detection | |
| Conditional generative | class-conditional / text-to-image generation |
Bayes ties them: , so a generative model + a class prior gives a discriminative one.
Source
CS231n 2025 Lec 13 slides ~35β47, 113β115 (discriminative/generative/conditional split, density normalization, taxonomy tree, Goodfellow attribution).
Generative Video Models
Do generative video models understand physics?
- https://arxiv.org/pdf/2501.09038 this paper proves NO
Itβs just learning to correlate frames, but it has no understanding of the worldβs physics.