Entropy-Regularized Reinforcement Learning
In standard reinforcement learning, the agent maximizes the expected return:
In entropy regularized reinforcement learning, the objective becomes:
where:
- is the entropy of the policy at state
- is a temperature coefficient that controls the trade-off between reward and entropy
Controlling exploration
You can think of this as the way to get exploration when we are working with continuous action space.