Contrastive Loss
Resources
Siamese Contrastive Loss
- if positive pair, if negative pair
- is the euclidean distance
- is the margin, how far apart the they must be
So if they’re a positive pair, we minimize the left term. If they’re a negative term, they must be at least apart.
Used for example in Siamese Networks and CLIP.
CLIP Contrastive Loss (InfoNCE)
Given a batch of image–text pairs :
- Image encoder:
- Text encoder:
- Normalize to unit vectors:
- Similarity score:
Image → Text loss:
- sum over all texts
Text → Image loss:
- over all images
Total CLIP loss (symmetric):
- (\tau): learnable temperature parameter.
- Numerator: similarity of true image–text pair.
- Denominator: similarity across all candidates in batch.
- Effect: each image must “classify” its paired text, and vice versa.
SigLIP
SigLIP removes the softmax and instead uses a sigmoid + binary cross-entropy formulation.