Hierarchical Policy
First saw this from Hierarchical Behavior Cloning.
“HBC consists of a low-level policy that is conditioned on future observations sg ∈ S (termed subgoals) and outputs action sequences to try and achieve them, and a high-level policy that predicts future subgoals from the current observation.”
- Robomimic paper
Some more papers on hierarchical planning:
In the Context of RL
So we have two policies:
- that predicts
- that predicts to get to the goal states