Distribution Shift
This is a very common term in the robotics world.
In imitation learning, the policy is trained on demonstrations from an expert.
- Training data consists of states visited by the expert → (s, a_expert).
- But when the learned policy is deployed, it doesn’t act exactly like the expert — it makes small mistakes.
- These mistakes push the agent into new states the expert never visited, meaning the learner faces inputs it never saw in training.
- This mismatch between training distribution (expert’s states) and test distribution (learner’s states) is the distribution shift.
Result: errors compound over time → performance collapses.
This initial diagram was created while I was reading ALOHA, also mentioned in my World Model + RL.