Self-Supervised Learning

Contrastive Learning

First seen from here: https://github.com/wjf5203/VNext Ohh, I actually found about about this, I spoke with the people at Cohere AI at HackWestern.

https://www.youtube.com/watch?v=7l6fttRJzeU&t=318s&ab_channel=ArtificialIntelligence

The goal of contrastive representation learning is to learn such an embedding space in which similar sample pairs stay close to each other, while dissimilar ones are far apart.

We have the following categorization

  1. Inter-sample classification (most dominant)
    • Given both similar (“positive”) and dissimilar (“negative”) candidates, to identify which ones are similar to the anchor data point is a classification task
  2. Feature Clustering
    • Find similar data samples by clustering them with learned features
  3. Multiview coding
    • Apply the InfoNCE objective to two or more different views of input data

The CLIP model enables Zero-Shot classification.

There are creative ways to construct a set of data point candidates:

  1. The original input and its distorted version
  2. Data that captures the same target from different views

Some loss:

  • Contrastive Loss (works with labelled dataset)
  • Triplet Loss
  • N-Pair Loss (generalizes triplet loss)
  • Lifted Structured Loss
  • Noise Contrastive Estimation (NCE)
  • InfoNCE
  • Soft-Nearest Neighbors Loss

Momentum Contrast (MoCo)

Kai Ma introduced this idea to me, and I would like to learn more about it.