RL Agent
An RL agent may include one or more of these components:
- Policy: agent’s behaviour function
- Value Function: how good is each state and/or action
- Model: agent’s representation of the environment
Model
It is not required, this is where there is the difference between model-free and model-based RL algorithms.
Categories of RL Agents
- Value-Based
- No policy (Implicit, ex: -greedy), there is Value Function
- Policy-Based \right-arrow Policy Gradient Methods
- Policy, and no Value Function
- Actor Critic
-
Stores both policy and value function
-
Critic: Updates action-value function parameters
-
Actor: Updates policy parameters , in direction suggested by critic
-
Actor-critic algorithms follow an approximate policy gradient
-
Model-Free → Policy and/or value function, no model Model-Based → Policy and/or value function, with model
Policy-Based vs. Value-Based Method
Why would you choose a value-based method over a policy-based method?
The value function might be quite complicated. But the policy might be very simple to represent. There are cases where a policy might be more compact.
See Policy Gradient Methods for advantages and disadvantages.