Distribution Shift

This is a very common term in the robotics world.

In imitation learning, the policy is trained on demonstrations from an expert.

  • Training data consists of states visited by the expert → (s, a_expert).
  • But when the learned policy is deployed, it doesn’t act exactly like the expert — it makes small mistakes.
  • These mistakes push the agent into new states the expert never visited, meaning the learner faces inputs it never saw in training.
  • This mismatch between training distribution (expert’s states) and test distribution (learner’s states) is the distribution shift.

Result: errors compound over time → performance collapses.

This initial diagram was created while I was reading ALOHA, also mentioned in my World Model + RL.