Policy Gradient Methods

Actor-Critic Policy Gradient

The point of actor critic methods is to decouple the gradient update from the q-function update.

  • Policy-based methods directly optimize the policy but have high variance
  • Value-based methods estimate values but don’t give a policy