Actor-Critic Methods
The point of actor critic methods is to decouple the gradient update from the q-function update.
- Policy-based methods directly optimize the policy but have high variance
- Value-based methods estimate values but don’t give a policy
Unlike Q-Learning (value-based methods), which directly attempts to learn the optimal Q-function, actor-critic methods aim to learn the Q-function corresponding to the current parameterized policy , which must obey the equation
Methods: