Deep Q-Network (DQN)

See original legendary paper published in 2015 by DeepMind.

Resources

When to use DQN?

I don’t know this, but Pieter Abbeel is giving the intuition for this in ~1:00 of lecture 3:

  • DQN is sample-efficient, but often not as stable. DQN is off-policy, see Policy Gradient Methods for on-policy methods.

Notes from Pieter Abbeel

Unlike in regular Q-Learning, instead of only having to update a table , we try to have a neural network learn .

Taken from the original paper:

There are a few interesting things from the above pseudocode:

  • They use two q-values: (the Q that we are learning) and . This helps stabilize the learning, as the two values are delayed between one another
    • is “lagging behind” Q
  • are stacked frames in the Atari game, because a single frame doesn’t have enough information (you need to know velocity, which direction the ball is moving)

Double DQN

All DQN implementations today use Double DQN just because it is better.

Stanford Notes

  • Uses experience replay and fixed Q-targets
  • Uses stochastic gradient descent

Achieved human-level performance on a number of Atari Games.

https://lilianweng.github.io/posts/2018-02-19-rl-overview/#deep-q-network

Sample an experience from the dataset. Compute the target value.

  • Double DQN
    • Use different weights, and
  • Dueling DQN

Uses the advantage function,