🛠️ Steven Gong

Search

SearchSearch
  • Contrastive Language-Image Pre-Training (CLIP)
  • Related

Mar 17, 2025, 1 min read

Contrastive Language-Image Pre-Training (CLIP)

CLIP was trained on a huge number of image-caption pairs from the internet.

Resources:

  • https://arxiv.org/pdf/2103.00020
  • https://medium.com/one-minute-machine-learning/clip-paper-explained-easily-in-3-levels-of-detail-61959814ad13

CLIP is a model for telling you how well a given image and a given text caption fit together.

Related

  • SigLIP

Graph View

Backlinks

  • Contrastive Learning
  • Feature-wise Linear Modulation (FiLM)
  • Segment Anything (SAM)
  • Sigmoid Loss for Language Image Pre-Training (SigLIP)
  • Vision Transformer (ViT)

Created with Quartz, © 2025

  • Blog
  • LinkedIn
  • Twitter
  • GitHub