🛠️ Steven Gong

Search

Sep 17, 2025, 1 min read

Momentum Contrast for Unsupervised Visual Representation Learning (MoCo)

Quite a seminal paper.

MoCo trains a feature encoder that maps images (or different augmented views of images) into a representation space in which:

Two views (augmentations) of the same image are pulled close together (“positive pairs”)
Views of different images are pushed apart (“negative pairs”)

This is the basic idea behind contrastive learning. However, there are some challenges.

CLIP doesn’t seem to suffer from these problems, or does it?

Graph View

Backlinks

Contrastive Learning
Representation Learning
Exploring Simple Siamese Representation Learning

Created with Quartz, © 2025

Blog
LinkedIn
Twitter
GitHub