Bootstrap your own latent: A new approach to self-supervised Learning (BYOL)

Seems like quite an important paper that V-JEPA mentions for preventing mode collapse.

L_{B Y O L} = ∥ q_{θ} (z_{1}) - sg (z_{2}^{'}) ∥_{2}^{2} + ∥ q_{θ} (z_{2}) - sg (z_{1}^{'}) ∥_{2}^{2}

where:

$z_{1} = f_{θ} (v_{1})$ , $z_{2} = f_{θ} (v_{2})$ are representations from the online encoder $f_{θ}$ given two augmentations $v_{1}, v_{2}$ of the same image,
$z_{1}^{'} = f_{ξ}^{'} (v_{1})$ , $z_{2}^{'} = f_{ξ}^{'} (v_{2})$ are representations from the target encoder $f_{ξ}^{'}$ (parameters $ξ$ updated via EMA),
$q_{θ}$ is the predictor head on the online branch,
$sg (\cdot)$ means stop-gradient (no gradients flow into the target encoder),
$∥ \cdot ∥_{2}^{2}$ is the squared $ℓ_{2}$ distance (MSE).

Intuition

Each online view predicts the target’s representation of the other view. No negatives are involved, collapse is prevented by stop-grad + predictor + momentum target.

SimCLR

🛠️ Steven Gong

Table of Contents

Bootstrap your own latent: A new approach to self-supervised Learning (BYOL)

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

Bootstrap your own latent: A new approach to self-supervised Learning (BYOL)

Related

Graph View

Backlinks