Deep Q-Network (DQN)
See original legendary paper published in 2015 by DeepMind.
Resources
- Slides Lecture 2: Deep Q-Learning from Deep RL Foundations, video here
When to use DQN?
I don’t know this, but Pieter Abbeel is giving the intuition for this in ~1:00 of lecture 3:
- DQN is sample-efficient, but often not as stable. DQN is off-policy, see Policy Gradient Methods for on-policy methods.
Notes from Pieter Abbeel
Unlike in regular Q-Learning, instead of only having to update a table , we try to have a neural network learn .
Taken from the original paper:
There are a few interesting things from the above pseudocode:
- They use two q-values: (the Q that we are learning) and . This helps stabilize the learning, as the two values are delayed between one another
- is “lagging behind” Q
- are stacked frames in the Atari game, because a single frame doesn’t have enough information (you need to know velocity, which direction the ball is moving)
Double DQN
All DQN implementations today use Double DQN just because it is better.
Stanford Notes
- Uses experience replay and fixed Q-targets
- Uses stochastic gradient descent
Achieved human-level performance on a number of Atari Games.
https://lilianweng.github.io/posts/2018-02-19-rl-overview/#deep-q-network
Sample an experience from the dataset.
Compute the target value.
- Double DQN
- Use different weights, and
- Dueling DQN
Uses the advantage function,