Optical Flow

Measure the motion of objects in an image.

Optical Flow Tracking

I remember seeing in the Visual Odometry from scratch tutorial, how they used optical flow to track features, as opposed to using the descriptors.

This idea is also presented in the slambook-en.

The Camera Motion Estimation with that is still the same.

The other way is use optical flow in the direct method.

Implementations with direct method

  • LSD-SLAM
  • SVO
  • DSO

Definition + Two-Stream Networks (CS231n 2025 Lec 10)

For a pair of consecutive frames and , optical flow is a per-pixel displacement field such that

i.e. for every pixel, where it went in the next frame. Stack flow fields across consecutive pairs and you get a tensor (horizontal + vertical components per pair).

Why care for action recognition

Johansson 1973 (point-light biological motion) showed humans can recognize actions purely from joint motion — no appearance. Pixel intensities mix appearance + motion together; optical flow isolates motion, so a CNN trained on flow can’t cheat on background context.

Two-Stream Networks (Simonyan & Zisserman NeurIPS 2014)

Two parallel CNNs, each producing class scores; average (or SVM-fuse) the scores at the end:

streaminputwhat it captures
Spatialsingle RGB frame, appearance (objects, scene)
Temporaloptical-flow stack, , “early-fused” by being read by the first 2D convmotion

The temporal stream is essentially a 2D CNN whose first conv takes input channels — flow is fused immediately at layer 1.

UCF-101 accuracy (action classification, 101 classes):

modelacc
3D CNN (C3D-style)65.4
Spatial stream alone73.0
Temporal stream alone83.7
Two-stream (avg fusion)86.9
Two-stream (SVM fusion)88.0

The temporal stream alone beats the spatial stream — motion is a stronger cue than appearance for action recognition. Fusion helps because the streams are nearly independent.

Two-stream nets are foundational: I3D’s best variant is two-stream inflated I3D (74.2 on Kinetics-400, vs 71.1 for single-stream).

Source

CS231n 2025 Lec 10 slides 49–58 (Johansson biological motion, optical flow definition, Two-Stream architecture, UCF-101 results). 2026 PDF not published — using 2025 fallback.