Feature Visualization
What does a CNN actually look at? A grab-bag of techniques for inspecting trained CNNs β where in the image the model attends, what each neuron responds to, which pixels swing the prediction.
Why care?
A trained classifier is a black box that outputs a class label. Feature visualization opens it up: it reveals failure modes (the network classifies βwolfβ because of the snowy background, not the animal), validates that the model learned semantically meaningful features, and powers downstream tools like weakly-supervised localization (object localization for free, just from classification labels).
Tier 1: Looking at weights directly
First-layer filters
The first conv layerβs filters have shape β directly visualizable as RGB images. For AlexNet (), ResNet-18/101 (), DenseNet-121 (), the learned filters look almost identical: oriented edges, opposing colors, frequency-tuned blobs β strong evidence that low-level vision is a converged problem.
Doesnβt generalize past the first layer: deeper conv weights operate in feature space (not pixel space), so direct visualization is uninformative.
Tier 2: Saliency via backprop (Simonyan, Vedaldi, Zisserman, ICLR Workshop 2014)
Question: which input pixels matter most for the predicted class?
Recipe:
- Forward pass to get class score (use unnormalized score, not softmax probability β gradients of softmax are weird).
- Backprop to image pixels.
- Take absolute value, max over RGB channels β 2D saliency map.
The bright spots show pixels that, if perturbed, would most change the score. Rough but cheap object localization fall-out from a classifier trained only with image-level labels.
Guided Backprop (Springenberg et al. ICLR Workshop 2015)
Standard backprop through a ReLU passes gradient only where the forward activation was . Guided backprop adds a second filter: also zero out gradients where the gradient itself is negative. So only positive gradients flowing through positive activations get through.
Forward (ReLU): Backward (standard): Backward (guided):
[ 1 -1 5] [1 0 5] [-2 0 -1] [0 0 0]
[ 2 -5 -7] β [2 0 0] [ 6 0 0] [6 0 0]
[-3 2 4] [0 2 4] [ 0 -1 3] [0 0 3]
Visually much cleaner β produces sharp, recognizable visualizations of what each intermediate neuron βlooks forβ.
Tier 3: Class Activation Mapping (CAM) β Zhou et al. CVPR 2016
CAM only works on architectures that end with Global Average Pool β single FC β softmax (e.g. ResNet, GoogLeNet). The trick exploits that GAP commutes with the FC layer.
Setup. Last conv layer outputs features , then:
The class activation map is the inner sum before spatial averaging:
Up-sample to image resolution, overlay as a heatmap β βfor class , hereβs where in the image the evidence came fromβ. Discriminative localization with no localization labels.
Limitation: CAM only applies to the last conv layer (because the derivation requires the GAP-then-FC structure). Architectures without that head canβt use CAM directly.
Tier 4: Grad-CAM (Selvaraju et al. CVPR 2017)
Generalizes CAM to any layer, any architecture, by replacing the analytical FC weights with gradients.
Recipe:
- Pick any layer with activations .
- Compute .
- Global-average-pool the gradients to get per-channel weights:
- Weighted combination of activations, then ReLU (only show evidence for, not against, the class):
The encodes βhow important is channel for class β via gradient magnitude, replacing CAMβs analytic . Reduces to standard CAM for the GAP-then-FC case.
Why ReLU? Without it, would also highlight regions that suppress the class β visually confusing. ReLU keeps only positive contributions.
Why ViTs are different
ViT doesnβt have conv feature maps in the same way β but you can visualize attention weights from the [CLS] token (or pooled token) to all patches, which serves the same purpose: which patches contributed to the prediction. Often cleaner than Grad-CAM because attention is itself a learned spatial weighting.
Source
CS231n 2025 Lec 9 slides 127β147, 175β178 (first layer filters, saliency via backprop, CAM derivation, Grad-CAM, intermediate features via guided backprop, ViT attention visualization). 2026 PDF not published β using 2025 fallback (April 29, 2025).