SlowFast
Two-pathway 3D CNN for video recognition (Feichtenhofer et al ICCV 2019, https://arxiv.org/pdf/1812.03982.pdf). Inspired by retinal P-cells (slow, color, detail) vs M-cells (fast, motion):
- Slow pathway: low frame rate (2 fps), high channel capacity — captures spatial semantics
- Fast pathway: high frame rate (16 fps), 1/8 the channels — captures motion
Lateral connections fuse fast→slow features. The asymmetric channel split keeps the fast pathway cheap despite its 8× temporal resolution.
CS231n 2025 Lec 10 lists SlowFast (with Nonlocal block) at 79.8 Kinetics-400 top-1, slotting between I3D (74.2) and the modern ViT-style video models (MViTv2-L 86.1, VideoMAE V2-g 90).