🛠️ Steven Gong

Search

Aug 16, 2025, 1 min read

Reward-to-go

The rewards from this point to the end of the trajectory. Learned from spinning up, where to present this as the target to learn the value function. And then use that as a Generalized Advantage Estimation

https://spinningup.openai.com/en/latest/spinningup/rl_intro3.html#id16 $R_{t} = \sum_{t^{'} = t}^{T} R (s_{t^{'}}, a_{t^{'}}, s_{t^{'} + 1})$ Where is the discount factor? I think in practice, the discount factor is actually used

Graph View

Backlinks

High-Dimensional Continuous Control Using Generalized Advantage Estimation

Created with Quartz, © 2025

Blog
LinkedIn
Twitter
GitHub