🛠️ Steven Gong

Search

Sep 18, 2025, 1 min read

Scaling Vision Transformers

Used by V-JEPA and V-JEPA 2.

ViT-B/L → Original ViT, DeiT, BEiT, DINO, SimMIM, MAE.
ViT-H → Google’s “Scaling Vision Transformers” (also used in DINOv2).
ViT-G → JEPA and other foundation models (e.g., OpenAI CLIP-G, Meta I-JEPA).

Graph View

Backlinks

No backlinks found

Created with Quartz, © 2026

Blog
LinkedIn
Twitter
GitHub