Hierarchical Policy

First saw this from Hierarchical Behavior Cloning.

“HBC consists of a low-level policy that is conditioned on future observations sg ∈ S (termed subgoals) and outputs action sequences to try and achieve them, and a high-level policy that predicts future subgoals from the current observation.”

Robomimic paper

Some more papers on hierarchical planning:

Compositional Foundation Models for Hierarchical Planning

In the Context of RL

So we have two policies:

$π_{θ}^{H} (s)$ that predicts $s_{t + T}$
$π_{θ}^{L} (s, s_{g})$ that predicts $a$ to get to the goal states

https://www.figure.ai/news/helix

🛠️ Steven Gong

Table of Contents

Hierarchical Policy

In the Context of RL

Graph View

Backlinks