Reinforcement Learning Entropy-Regularized Reinforcement Learning π∗=argmaxπEτ∼π∑t=0∞γt(R(st,at,st+1)+αH(π(⋅∣st))), Related n-step Reinforcement Learning