Monte-Carlo Policy Gradient (REINFORCE)
This is just a specific Vanilla Policy Gradient, see that page for notes on how its implemented.
It uses the log-likelihood trick to estimate gradients of the expected return.
Resources
This is just a specific Vanilla Policy Gradient, see that page for notes on how its implemented.
It uses the log-likelihood trick to estimate gradients of the expected return.
Resources