Multi-Armed bandit (MAB)

Simplest form of Reinforcement Learning problem.

Was an introductory chapter to RL to explore the Exploration and Exploitation problem.

Popular MAB algorithms, based on different ideas of ways to encourage exploration:

  • Random Exploration
  • Optimism in the face of uncertainty
  • Information State Space (Consider agent’s information as part of its state, and lookahead to see how information helps reward), this basically tansforms back the bandit problem into an MDP problem
    • Gittins indices
    • Bayes-adaptive MDPs

To compare the performances of various bandit algorithms, conduct a Parameter Study.

For the non-stationary problem (also known as β€œConcept Drift”), we have

Incremental implementation. This is very common, kind of like Incremental Mean.

