Scaling Vision Transformers Used by V-JEPA and V-JEPA 2. ViT-B/L → Original ViT, DeiT, BEiT, DINO, SimMIM, MAE. ViT-H → Google’s “Scaling Vision Transformers” (also used in DINOv2). ViT-G → JEPA and other foundation models (e.g., OpenAI CLIP-G, Meta I-JEPA).