🛠️ Steven Gong

Search

SearchSearch
  • Monte-Carlo Policy Gradient (REINFORCE)
  • Related

Aug 21, 2025, 1 min read

Policy Gradient Methods

Monte-Carlo Policy Gradient (REINFORCE)

This is just a specific Vanilla Policy Gradient, see that page for notes on how its implemented.

It uses the log-likelihood trick to estimate gradients of the expected return. J(θ)=E[∇logπθ​(a∣s)Gt​]

Resources

  • https://lilianweng.github.io/posts/2018-04-08-policy-gradient/#reinforce

Related

  • PPO

Graph View

Backlinks

  • Policy Gradient Methods
  • Policy
  • Learning to Drive from a World Model

Created with Quartz, © 2025

  • Blog
  • LinkedIn
  • Twitter
  • GitHub