Probing (ML)
Saw this in the V-JEPA paper.
Linear Probing
You freeze the backbone (e.g., ViT, ResNet) and train a linear classifier on top of the frozen embeddings. This tests whether the representation is linearly separable for the downstream task.
Attentive Probing
Instead of using a single global representation (like [CLS]), you allow the probe to attend across all patch/token embeddings. You add a small self-attention (or cross-attention) layer that learns to focus on the most relevant parts of the representation for the target task. Then you still use a shallow head (often linear) for classification.