Reinforcement Learning

Entropy-Regularized Reinforcement Learning

In standard reinforcement learning, the agent maximizes the expected return:

In entropy regularized reinforcement learning, the objective becomes:

where:

  • is the entropy of the policy at state
  • is a temperature coefficient that controls the trade-off between reward and entropy

Controlling exploration

You can think of this as the way to get exploration when we are working with continuous action space.