Generative Adversarial Network (GAN)
A GAN trains a generator and discriminator in a minimax game so the generator learns to produce samples indistinguishable from real data.
where:
- is the generator mapping noise to a sample
- is the discriminator estimating the probability a sample is real
Intuition
A two-player game between a forger and a detective. The generator forges samples from noise, the discriminator tries to tell forgeries from real data, and both improve by playing each other. At Nash equilibrium the forgeries are indistinguishable, i.e. . Notice thereβs no explicit density to write down: we never compute , we just need to sample from it, which is exactly what does.
To steven: youβre probably looking for A Neural Representation of Sketch Drawings and https://quickdraw.withgoogle.com/.
Resources:
- https://developers.google.com/machine-learning/gan
- Intro video by Computerphile
- Training tips: https://github.com/soumith/ganhacks
- Applications: https://github.com/nashory/gans-awesome-applications#3d-object-generation
- Tutorial: https://www.youtube.com/watch?v=Mng57Tj18pc&ab_channel=DigitalSreeni
- Course: https://github.com/johnowhitaker/aiaiart
GANs are composed of two Neural Nets:
- The generator network creates new synthetic images trying to fool the discriminator
- The discriminator network learns to tell a fake, synthesized image from a real one
The competition between both networks allows them to improve, until the generator becomes so good that fake samples cannot be distinguished from real ones.
Examples: DCGAN, StyleGAN, CycleGAN
Mathematical Formalization
Formally, the generative model is pitted against an adversary: a discriminative model that learns to determine whether a sample is from the model distribution or the data distribution.
and play the following two-player minimax game with value function :
where:
- and are differentiable functions represented by an MLP
- is the probability that came from the data rather than
- is trained to maximize the probability of assigning the correct label to both training examples and samples from
- takes as input and outputs an image, trained to minimize
With a GAN, given enough iterations, converges to . In other words, from a random vector , the network can synthesize an image that resembles one drawn from the true distribution .
The discriminator is a traditional CNN. The generator takes in random noise and produces an output.
Wasserstein GAN
Wasserstein Metric GAN: https://www.youtube.com/watch?v=xs9uibPODGk&ab_channel=Oneworldtheoreticalmachinelearning
Wasserstein GAN Paper: https://arxiv.org/pdf/1701.07875.pdf
Walkthrough (CS231n 2025 Lec 14)
Setup
GAN is the implicit-direct branch of the Goodfellow taxonomy: no explicit density , just sample. Sample (e.g. ), pass through generator , get from implicit distribution . The goal is , but we have no way to compute , so we canβt do MLE. Trick: train a discriminator to classify real vs fake, then use it as the loss for .
Minimax objective
- Inner max: wants to push real samples toward 1 and fake toward 0
- Outer min: wants (fool the discriminator)
Trained via alternating gradient updates: , then .
No single loss curve to monitor
Youβre chasing a saddle point, not minimizing a function. Loss values on their own donβt tell you whether training is working
Saturation problem & non-saturating fix
At the start of training, is bad, so and is flat with near-zero gradient. canβt learn from a vanishing signal.
Fix (Goodfellow 2014): instead of minimizing , train to maximize . Same direction (fool ), but the gradient is large precisely when is winning. Now standard.
Optimal discriminator
For fixed , the optimal has a closed form:
Plugging back into the outer min, the global optimum is . If is optimal at every step, converges to the data distribution. Caveats in practice: finite-capacity nets, no convergence guarantee from alternating SGD, mode collapse.
Read as a likelihood ratio: at points where real data is more likely than fake, ; where fake is more likely, . The only way for to be exactly everywhere (maximally confused detective) is .
DC-GAN (Radford et al. ICLR 2016)
First architecture to make GANs work on non-toy data: strided/transposed convolutions, BatchNorm, no fully-connected layers, ReLU/LeakyReLU. Aside: Alec Radford later did GPT-1 and GPT-2.
StyleGAN (Karras et al. CVPR 2019)
Two-stage generator: a mapping network produces a style vector, and a synthesis network generates the image, conditioning every layer on via AdaIN (Adaptive Instance Norm):
Per-layer style injection lets you control coarse vs fine attributes independently.
Latent-space interpolation
Linearly interpolate and decode . The output sweeps smoothly between the two endpoints. StyleGAN3 cat morphs are the canonical demo.
Summary: pros & cons
- Pros: simple objective, very high sample quality, fast single-step generation
- Cons: no loss curve to monitor, training is unstable (mode collapse, vanishing gradients), hard to scale, no explicit for likelihood evaluation
- Era: GANs were the go-to image generator from ~2016-2021, then displaced by diffusion
From CS231n 2025 Lec 14 slides ~14-35 (minimax setup, alternating updates, saturation problem and non-saturating fix, optimal discriminator derivation, DC-GAN, StyleGAN AdaIN, latent interpolation, GAN summary pros/cons).