PixelRNN / PixelCNN

Explicit tractable-density generative models for images. Treat the image as a 1D sequence of subpixels in raster / scanline order and autoregressively predict each next subpixel from all previous ones:

$p_{W} (x) = \prod_{t = 1}^{T} p_{W} (x_{t} ∣ x_{1}, \dots, x_{t - 1})$

Each subpixel $x_{t}$ is an 8-bit integer, so the per-step model is a 256-way softmax classification. The loss is plain cross-entropy — no variational tricks, no adversarial game, and $p (x)$ can be evaluated exactly for any image.

Why 256-way softmax instead of predicting a continuous value?

Treating subpixels as categorical captures multimodality — e.g. an edge pixel could plausibly be dark OR bright, but almost never the mid-gray average. A regression loss (L2) would collapse to the blurry mean.

PixelRNN (van den Oord, ICML 2016)

Row-by-row generation with an RNN (LSTM variants — Row-LSTM, Diagonal BiLSTM). Context for each position comes from the recurrent hidden state carrying everything above and to the left.

PixelCNN (van den Oord, NeurIPS 2016)

Replace the RNN with masked convolutions so the receptive field at each position only includes already-generated pixels (top-left neighborhood). Training is fully parallel over an image (same as training an autoregressive Transformer) — much faster than the recurrent PixelRNN.

Sampling is still sequential: one subpixel at a time, conditioned on all previously sampled ones.

Problem: scale

A 1024×1024 RGB image is $\sim$ 3 million subpixels = 3M sequential sampling steps. Even 256×256 is 200K. This is why modern autoregressive image models (VQ-VAE + AR, ImageGPT, MaskGIT, Parti) model tiles or discrete tokens instead of raw subpixels — reduce sequence length by 100–1000×.

Source

CS231n 2025 Lec 13 slides ~55–60 (PixelRNN/PixelCNN as autoregressive models of images, raster-order subpixels, 256-way softmax, 3M-subpixel scaling problem, foreshadow of tile-based AR).

🛠️ Steven Gong

Table of Contents

PixelRNN / PixelCNN

PixelRNN (van den Oord, ICML 2016)

PixelCNN (van den Oord, NeurIPS 2016)

Problem: scale

Source

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

PixelRNN / PixelCNN

PixelRNN (van den Oord, ICML 2016)

PixelCNN (van den Oord, NeurIPS 2016)

Problem: scale

Related

Source

Graph View

Backlinks