Model-Free vs. Model-Based RL

Model-based methods are methods that use models and planning.

Model-free methods are explicitly trial-and-error learners.

In Model-free methods, we don’t know the dynamics or the reward of the system (which is oftentimes the real world).

See more justification under Model-Free Control.

If a model is not available, then it is particularly useful to estimate action values (the values of state–action pairs) rather than state values.

How this differs in implementation

With a model, state values alone are sufficient to determine a policy; one simply looks ahead one step and chooses whichever action leads to the best combination of reward and next state, as we did in the chapter on DP.

Without a model, however, state values alone are not sufficient. One must explicitly estimate the value of each action in order for the values to be useful in suggesting a policy.

What about Value Function Approximation

I think that Value Function Approximation is like another separate topic that deals with a different problem: When your sample space is so large that it is very hard to estimate the value functions.

Other Field

Is the future model-based or model free?

Humans build a model of our environment. The future is an environment where we learn our environment.

The problem is that humans are data hungry. We are actually very slow learners. So unless you have tons of data, model-based is the way to go.

The difficulty is creating a model of the environment.

🛠️ Steven Gong

Table of Contents

Model-Free vs. Model-Based RL

How this differs in implementation

What about Value Function Approximation

Other Field

Graph View

Backlinks