Convolution

Pooling

Pooling doesn’t have weights, it just has parameters (filter size , stride , and type (max or average)).

  • Max Pooling β†’ Take the max within each filter

Take your input and break it down into smaller regions. Max Pooling is just taking the maximum number for each region.

Intution: As long as the features are detected anywhere in one of these quadrants, it remains preserved in the output of max pooling.

  • Average Pooling β†’ Take the average within each filter

This reduces the size or resolution of a signal (e.g., an image):

Downsampling

  1. In CNNs (e.g., U-Net):
    • Max Pooling: Keeps the largest value in a region (e.g., 2Γ—2).
    • Average Pooling: Takes the average of values in the region.
    • Strided Convolution: A convolution operation with a stride > 1 (e.g., 2), which skips pixels.
  2. In signal/image processing:
    • Decimation: Drop every n-th sample after low-pass filtering to avoid aliasing.

Upsampling

  1. In CNNs
    • Transpose Convolution (a.k.a. Deconvolution): Learns how to upsample by using learned kernels.
    • Nearest Neighbor: Copies the closest pixel to fill in the new pixels.
    • Bilinear/Bicubic Interpolation: Interpolates new pixel values based on neighbors.
    • Unpooling: Reverses pooling using stored indices (less common).
  2. In signal/image processing:
    • Zero Insertion + Filtering: Insert zeros between samples and apply low-pass filter to reconstruct.

Pooling layer summary (CS231n Lec 5)

Hyperparameters: kernel size , stride , pooling function (max or avg). No learnable parameters β€” pooling is a fixed operator. For input :

Output channels match input channels β€” pooling acts independently on each depth slice. Common setting: max pool with β†’ 2Γ— downsampling, no overlap.

Why pool instead of strided conv? Pooling adds a small dose of local translation invariance on top of conv’s translation equivariance β€” a feature shifted by a pixel inside the pooling window produces the same output. Strided convolution achieves the same downsampling but is learned and equivariant only.

Modern architectures often replace pooling with strided conv (e.g. ResNet uses stride-2 convs in some stages); both are valid, just different inductive biases.

Source

CS231n Lec 5 slides 101–106 (pooling layer, max pooling example, summary, translation invariance).

Unpooling (CS231n 2025 Lec 9)

The mirror operations used in segmentation decoders to undo a pooling step:

  • Nearest Neighbor unpooling: copy each value into every cell of a block.
  • Bed of Nails: place the value in the top-left of a block, zeros elsewhere. Cheap but loses spatial structure.
  • Max Unpooling: each max-pool layer remembers the argmax position within each window during the forward pass; the matching unpool layer writes the value back to that exact position, zeros elsewhere. Preserves where the salient features were, so edges line up.

Used in encoder-decoder segmentation networks (FCN, SegNet) where each pool layer is paired with an unpool layer at the same resolution. For learnable upsampling, use Transposed Convolution instead.

Source

CS231n 2025 Lec 9 slides 41–48 (in-network upsampling, NN, bed of nails, max unpooling). 2026 PDF not published β€” using 2025 fallback.