Multi-Armed bandit (MAB)

Simplest form of Reinforcement Learning problem.

Was an introductory chapter to RL to explore the Exploration and Exploitation problem.

This article is a good brush up for the math behind the MAB algorithms: https://lilianweng.github.io/posts/2018-01-23-multi-armed-bandit/#exploitation-vs-exploration

Popular MAB algorithms, based on different ideas of ways to encourage exploration:

  • Random Exploration
  • Optimism in the face of uncertainty
  • Information State Space (Consider agent’s information as part of its state, and lookahead to see how information helps reward), this basically tansforms back the bandit problem into an MDP problem
    • Gittins indices
    • Bayes-adaptive MDPs

To compare the performances of various bandit algorithms, conduct a Parameter Study.

Resources to look into

  • Go back home and review k-bandits from CS239 or something, the first chapter of the book.

For the non-stationary problem (also known as β€œConcept Drift”), we have

Some papers

Incremental implementation. This is very common, kind of like Incremental Mean.

The initial equation is

We used two approaches:

  1. Cap to , where for example
  2. Use the form (weight decay factor)

We know that