Probing (ML)

Saw this in the V-JEPA paper.

Linear Probing

You freeze the backbone (e.g., ViT, ResNet) and train a linear classifier on top of the frozen embeddings. This tests whether the representation is linearly separable for the downstream task.

Attentive Probing

Instead of using a single global representation (like [CLS]), you allow the probe to attend across all patch/token embeddings. You add a small self-attention (or cross-attention) layer that learns to focus on the most relevant parts of the representation for the target task. Then you still use a shallow head (often linear) for classification.