Contrastive Learning

Contrastive Loss

Resources

Siamese Contrastive Loss

  • if positive pair, if negative pair
  • is the euclidean distance
  • is the margin, how far apart the they must be

So if they’re a positive pair, we minimize the left term. If they’re a negative term, they must be at least apart.

Used for example in Siamese Networks and CLIP.

CLIP Contrastive Loss (InfoNCE)

Given a batch of image–text pairs :

  • Image encoder:
  • Text encoder:
  • Normalize to unit vectors:
  • Similarity score:

Image → Text loss:

  • sum over all texts

Text → Image loss:

  • over all images

Total CLIP loss (symmetric):

  • (\tau): learnable temperature parameter.
  • Numerator: similarity of true image–text pair.
  • Denominator: similarity across all candidates in batch.
  • Effect: each image must “classify” its paired text, and vice versa.

SigLIP

SigLIP removes the softmax and instead uses a sigmoid + binary cross-entropy formulation.