Entropy-Regularized Reinforcement Learning
In standard reinforcement learning, the agent maximizes the expected return:
In entropy regularized reinforcement learning, the objective becomes:
where:
- is the entropy of the policy at state
- is a temperature coefficient that controls the trade-off between reward and entropy