Reinforcement Learning

Entropy-Regularized Reinforcement Learning

In standard reinforcement learning, the agent maximizes the expected return:

In entropy regularized reinforcement learning, the objective becomes:

where:

  • is the entropy of the policy at state
  • is a temperature coefficient that controls the trade-off between reward and entropy