Revisiting Feature Prediction for Learning Visual Representations from Video (V-JEPA) No use of pre-training. Why don’t you believe in the power of pre-training? To show scaling laws? Video benchmarks VideoGLUE K400 SSv2 AVA Image benchmarks ImageNet