Generative Model

You have GANs and Diffusion Model that can generate data. There’s also GPT-3.

Types of generative models (source):

Likelihood-based models: approximate the probability distribution $p (x)$ . Ex:
- Autoregressive Model: $p (x) = \prod_{t = 1}^{T} p (x_{t} ∣ x_{< t})$
- Variational Antoencoder (VAE): $p (x) = \int p (x ∣ z) p (z) d z$
- Flow-Based Models: $x = f (z), p (x) = p (z) det (\frac{\partial f ^{- 1}}{\partial x})$
Implicit / Score-Based Models: do not model $p (x)$ explicitly, or use alternative objectives to generate samples. Ex:
- Generative Adversarial Network (GAN): $min_{G} max_{D} E_{x \sim p_{data}} [lo g D (x)] + E_{z \sim p (z)} [lo g (1 - D (G (z)))]$
- Diffusion Models: $x_{t - 1} = α_{t} (x_{t} - γ ϵ_{θ} (x_{t}, t)) + N (0, σ_{t}^{2})$ .
- Energy-Based Model (EBM): $p (x) \propto exp (- E (x))$

I still don't fully get the difference....?

It’s not about starting from noise and then denoisinig.

What is $p(x)$ ?

$p (x)$ is the probability density (or mass) of a data point $x$ under your model.

It tells you how likely your model thinks that $x$ is.

Does this matter anymore?

Like just slap a transformer and feed data, does this really matter? The architecture is becoming standardized (transformers), but the generative modeling paradigm — diffusion vs autoregressive vs GAN — still shapes what the model does and how it learns.

https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ https://yang-song.net/blog/2021/score/

From lilian wag’s blog, it seems that all of these are really similar.

Generative Video Models

Do generative video models understand physics?

https://arxiv.org/pdf/2501.09038 this paper proves NO

It’s just learning to correlate frames, but it has no understanding of the world’s physics.

🛠️ Steven Gong

Table of Contents

Generative Model

Generative Video Models

Graph View

Backlinks