🛠️ Steven Gong

Search

Aug 16, 2025, 1 min read

High-Dimensional Continuous Control Using Generalized Advantage Estimation

Heard from the spinning up

“For a more detailed treatment of this topic, you should read the paper on Generalized Advantage Estimation (GAE), which goes into depth about different choices of $Φ_{t}$ in the background sections.”
https://spinningup.openai.com/en/latest/spinningup/rl_intro3.html#id16

There are a few $ϕ_{t}$ that we could choose: $ϕ_{t} = R (τ)$ $ϕ_{t} = R (s_{t}, a_{t}, s_{t + 1})$ $ϕ_{t} = V (s) - R_{t}$

where $R$ is a Reward-to-go $ϕ_{t} = Q (s, a) - V (s)$
This is know as the Advantage Function

https://arxiv.org/pdf/1506.02438

Graph View

Backlinks

Policy Gradient Methods
Reward-to-go
Vanilla Policy Gradient (VPG)

Created with Quartz, © 2026

Blog
LinkedIn
Twitter
GitHub