Reward-to-go
The rewards from this point to the end of the trajectory. Learned from spinning up, where to present this as the target to learn the value function. And then use that as a Generalized Advantage Estimation
- https://spinningup.openai.com/en/latest/spinningup/rl_intro3.html#id16 Where is the discount factor? I think in practice, the discount factor is actually used