Reinforcement Learning

Exploration and Exploitation

The tradeoff between exploration and exploitation is a big challenge in Reinforcement Learning. But also in life.

On Life

I find recently that life is a lot about exploration vs. exploitation.

  • You want to try lots of new experiences (exploration), but you also should do things that you are good at (exploitation) because it pays you
  • You want to learn everything (exploration), but you have a finite time to work on certain Important Problem (exploitation)
  • You want to be with the people close to you and not waste your time (exploitation), but you also want to meet new and cool people (exploration), but might potentially waste a lot of time in your search to doing so.

Work-Life Balance is a form of exploration-exploitation tradeoff?

This article explains it super well: https://lilianweng.github.io/posts/2018-01-23-multi-armed-bandit/#exploitation-vs-exploration

The dilemma is that neither exploration nor exploitation can be pursued exclusively without failing at the task. The agent must try a variety of actions and progressively favor those that appear to be best.

To obtain a lot of reward, a reinforcement learning agent must prefer actions that it has tried in the past and found to be effectivein producing reward. But to discover such actions, it has to try actions that it has not selected before. The agent has to exploit what it has already experienced in order to obtain reward, but it also has to explore in order to make better action selections in the future.

On a stochastic task, each action must be tried many times to gain a reliable estimate of its expected reward.

The exploration–exploitation dilemma has been intensively studied by mathematicians for many decades, yet remains unresolved.

See Regret.

The entire issue of balancing exploration and exploitation does not even arise in supervised and unsupervised learning, at least in the purest forms of these paradigms.