🛠️ Steven Gong

Search

Monte-Carlo Policy Gradient (REINFORCE)
Related

Feb 11, 2026, 1 min read

Policy Gradient Methods

Monte-Carlo Policy Gradient (REINFORCE)

This is just a specific Vanilla Policy Gradient, see that page for notes on how its implemented.

It uses the log-likelihood trick to estimate gradients of the expected return. $J (θ) = E [\nabla lo g π_{θ} (a ∣ s) G_{t}]$

Resources

https://lilianweng.github.io/posts/2018-04-08-policy-gradient/#reinforce

Related

PPO

Graph View

Backlinks

Policy Gradient Methods
Policy
Learning to Drive from a World Model

Created with Quartz, © 2026

Blog
LinkedIn
Twitter
GitHub