Dynamic Programming in Reinforcement Learning

Dynamic programming is used to solve many other problems, e.g.

  • Scheduling algorithms
  • String algorithms (e.g. sequence alignment)
  • Graph algorithms (e.g. shortest path algorithms)
  • Graphical models (e.g. Viterbi algorithm)
  • Bioinformatics (e.g. lattice models)

Dynamic sequential or temporal component to the problem Programming optimising a “program”, i.e. a policy

  • c.f. linear programming

A method for solving complex problems

  • By breaking them down into subproblems
    • Solve the subproblems
    • Combine solutions to subproblems

Policy Iteration Value Iteration

Extensions

  • In-place dynamic programming
  • Prioritized sweeping
    • Use a priority queue
  • Real-time dynamic programming

We will consider sample backups.