Pooling

Pooling doesn’t have weights, it just has parameters (filter size $f$ , stride $s$ , and type (max or average)).

Take your input and break it down into smaller regions. Max Pooling is just taking the maximum number for each region.

Intution: As long as the features are detected anywhere in one of these quadrants, it remains preserved in the output of max pooling.

This reduces the size or resolution of a signal (e.g., an image):

In CNNs (e.g., U-Net):
- Max Pooling: Keeps the largest value in a region (e.g., 2×2).
- Average Pooling: Takes the average of values in the region.
- Strided Convolution: A convolution operation with a stride > 1 (e.g., 2), which skips pixels.
In signal/image processing:
- Decimation: Drop every n-th sample after low-pass filtering to avoid aliasing.

In CNNs
- Transpose Convolution (a.k.a. Deconvolution): Learns how to upsample by using learned kernels.
- Nearest Neighbor: Copies the closest pixel to fill in the new pixels.
- Bilinear/Bicubic Interpolation: Interpolates new pixel values based on neighbors.
- Unpooling: Reverses pooling using stored indices (less common).
In signal/image processing:
- Zero Insertion + Filtering: Insert zeros between samples and apply low-pass filter to reconstruct.

🛠️ Steven Gong