RL Agent

An RL agent may include one or more of these components:

  • Policy: agent’s behaviour function
  • Value Function: how good is each state and/or action
  • Model: agent’s representation of the environment


It is not required, this is where there is the difference between model-free and model-based RL algorithms.

Categories of RL Agents

  • Value-Based
    • No policy (Implicit, ex: -greedy), there is Value Function
  • Policy-Based \right-arrow Policy Gradient Methods
    • Policy, and no Value Function
  • Actor Critic
    • Stores both policy and value function

    • Critic: Updates action-value function parameters

    • Actor: Updates policy parameters , in direction suggested by critic

    • Actor-critic algorithms follow an approximate policy gradient

Model-Free Policy and/or value function, no model Model-Based Policy and/or value function, with model

Policy-Based vs. Value-Based Method

Why would you choose a value-based method over a policy-based method?

The value function might be quite complicated. But the policy might be very simple to represent. There are cases where a policy might be more compact.

See Policy Gradient Methods for advantages and disadvantages.