I am going to gather all of my research here, not on Notion, because I can directly make links here.
Some worries that I need to Sanity Check:
- The Poker bot knows to underrepresent its hand, and overrepresent its hand (bluff)
I personally think this code is pretty badly written. When you write code, you should write it such that the you that forgets about it and works on it again in 6 months should not have difficulty going through it.
- Implement Abstraction from here: https://github.com/matthewkennedy5/Poker, no here: https://github.com/Jingho/Deepstack-for-Bucket
- Implement depth-limited solving from here https://github.com/zanussbaum/pluribus/tree/master/leduc
- Actually, just look at the original paper. They also have details on how they constructed to superhuman AI agent
The Pipeline: Original Game ( (Automated Abstraction) Abstracted Game (Custom Equilibrium Finding Algorithm CFR) Nash Equilibrium (Reverse Model) Nash Equilibrium
- Real-time planning is super important. Without that you cannot do much. In Poker Libratus, it was subgame solving.
- How to evaluate performance in Poker? Because unlike Chess which can have an objective Elo System, Poker is a very stochastic game with a lot of luck involved. Which makes it very interesting. Doing abstractions taught me about the game of poker
A common example of a drawing hand is one in which the player has four cards of the same suit.At the present stage the hand is not very strong, but could become so if a card of the same suit showed up later in the game. Since the strength of such a hand could potentially turn out to be much different later in the game, it is generally accepted among poker experts that such a hand should be played differently than another hand with a similar chance of winning, but without as much potential.
If using the difference between probabilities of winning as the clustering metric, the abstraction algorithm would consider these two very different situations similar.
I need to implement depth-limited search to solve it on my computer, as well as make use of abstraction for the original blueprint strategy.
Some Calculations about how to solve Poker
TODO: Use counting and odds to put opponents on certain probabilities?
Total Possible Information Sets (Cards Only) These are for each stage of the game
Total Possible Information Sets (Actions Only), where we consider that the cards we are dealt with are fixed: See the paper, but in total total we have information sets
I am going to work on this project, I have a feeling it is going to be super interesting.
- Recently open-sourced solution, though it seems that it doesn’t include the code for abstractions, nor depth-limited solving https://github.com/ai-decision/decisionholdem
- https://github.com/matthewkennedy5/Poker -> Really good writing
- https://github.com/zanussbaum/pluribus (An attempt at implementing Pluribus)
- https://github.com/doas3140/PyStack (Python Implementation of DeepStack)
- These students tried to make a copy of Libratus: https://github.com/michalp21/coms4995-finalproj https://github.com/tansey/pycfr (8 years old) -> implementation in CFR, not support nolimit texas holdem
- Pokerbot https://github.com/dickreuter/Poker
- Gym Environment https://github.com/dickreuter/neuron_poker
- Use treys https://github.com/ihendley/treys for hand evaluation because it’s likely faster than my own :(
- https://github.com/gorel/C-Poker-AI (rand() is non-thread safe??)
- This seems very promising, and they have an online link https://rlcard.org/
- Computer poker: A review
- An Introduction to CFR (Neller, 2013) ESSENTIAL
- Vanilla CFR
- (CFR first introduced) Regret Minimization in Games with Incomplete Information (Bowling, 2007)
- Using CFR to Create Competitive Multiplayer Poker Agents (Risk, 2010)
- Monte-Carlo CFR (IMPORTANT)
- CFR-BR (CFR-Best Response)
- Finding Optimal Abstract Strategies in Extensive-Form Games (Burch, 2012) (IMPORTANT paper in finding)
- CFR-D (Decomposition)
- Solving Imperfect Information Games Using Decomposition (Burch, 2013)
- (Pseudocode) Solving Large Imperfect Information Games Using CFR+ (Tammelin, 2014)
- Solving Heads-up Limit Texas Hold’em (Tammelin, 2015)
- RBP Regret-Based Pruning
- RBP is particularly useful in large games where many actions are suboptimal, but where it is not known beforehand which actions those are
- Regret-Based Pruning in Extensive-Form Games (Brown, 2015)
- Warm Start CFR
- DCFR (Discounted CFR)
- ICFR (instant CFR)
- Deep CFR
- Depth-Limited Solving (IMPORTANT): This is a key technique that allows us to train a top tier Poker AI on our local computer, by improving a blueprint strategy.
- Depth-Limited Solving for Imperfect-Information Games (Brown, 2018)
- Abstractions (IMPORTANT): See Game Abstraction. Abstractions are absolutely necessary, since Texas Hold’Em is too big to solve directly
- A heads-up no-limit Texas Hold’em poker player: Discretized betting models and automatically generated equilibrium-finding programs
- Action Translation in Extensive-Form Games with Large Action Spaces: Axioms, Paradoxes, and the Pseudo-Harmonic Mapping (Sandholm, 2013)
- Evaluating State-Space Abstractions in Extensive-Form Games (Burch, 2013)
- Potential-Aware Imperfect-Recall Abstraction with Earth Mover’s Distance in Imperfect-Information Games (Sandholm, 2014)
- Abstraction for Solving Large Incomplete-Information Games (Sandholm, 2015)
- Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold’em Agent (Brown, 2015)
- Exploitability / Best Response
- Accelerating Best Response Calculation in Large Extensive Games (Johanson, 2011)
- Subgame Solving: This seems to be impossible to do on a local computer
- Measuring the Size of Poker
- Measuring the Size of Large No-Limit Poker Games (Johnson, 2013)
- Evaluating the Performance of a Poker Agent
- A TOOL FOR THE DIRECT ASSESSMENT OF POKER DECISIONS (Billings, 2006)
- Strategy Evaluation in Extensive Games with Importance Sampling (Bowling, 2008)
Other Links (Web Pages + Videos)
- https://poker.cs.ualberta.ca/resources.html, this is really good https://poker.cs.ualberta.ca/general_information.html for general information
- Poker Database: https://poker.cs.ualberta.ca/irc_poker_database.html
- The State of Techniques for Solving Large Imperfect-Information Games, Including Poker by Sandholm, really solid overview about abstractions of the game
- Superhuman AI for heads-up no-limit poker: Libratus beats top professionals by Noam Brown
- AI for Imperfect-Information Games: Beating Top Humans in No-Limit Poker by Noam Brown at Microsoft Research
Poker Agents Papers
- Slumbot “250,000 core hours and 2 TB of RAM to compute its strategy”
- Polaris (2008)
- Baby Tartanian 8 (2016) “2 million core hours and 18 TB of RAM to compute its strategy”
- DeepStack (2017)
- Libratus (2017)
- Blueprint Strategy (Full-Game Strategy) using MCCFR
- Subgame Solving with CFR+
- Adapt to opponent
- Pluribus, video here (2019)
The difficulty in making a poker AI is that it is an imperfect information game. In games of imperfect information, the value of your states depends on the strategy your opponent uses.
This is never the case in perfect information games, such as chess, since no matter your probability for each actions, there is an objective best move that you can calculate by going down the game tree.
Let’s think about a game of RPS, which is also a game of imperfect information: What would be the optimal strategy to beat the opponent? This is not such a simple question, because it depends on what the opponent plays.
If your opponent always plays rock, your optimal strategy is to always play paper. https://www.quantamagazine.org/the-game-theory-math-behind-rock-paper-scissors-20180402/
Thus, the “best” strategy depends on the strategy of your opponent. In a game of chess, this is not the case, you can just iterate through over all possibilities: just draw a tree and look at all the possible moves the opponent can make, and see the payout (win / loss / draw). You want to choose the move such that in all possibilities it doesn’t lead to a loss (or guarantees a win).
In Imperfect Information games like Poker, you can’t do that since you don’t know exactly which node you are at (i.e. what cards your opponent has). Two different scenarios are not differentiable from your perspective (technical way to say this is that an Information Set contains multiple nodes).
I think I am going to use ReBeL implementation.
And so this means I can just do it on my local computer.
Starting stack: 100BB https://wizardofodds.com/games/texas-hold-em/
Other Explanation “RL can’t be directly applied to Poker because of uncertainty. Poker game state can’t be calculated, and we can’t define the value function.”
- Most modern algorithms use this idea of a value function.
Also, a lot of the modern algorithms use observations that can represent the full mdp, its much harder for POMDP.
- But we can still do that with VFA in a model-free environment, it will just be very susceptible to being exploited (and have a bad policy). Which is why we need to play at nash equilibrium
- Generate a random probability distribution for the three values.
- Train using RL, and see how that performs, because I don’t think it matters right?
- And if we use a particular strategy, can the opponent adapt? That’s where we need Nash Equilibrium
My opponent can adjust to my playing style. TODO: SHOW THE MATRIX OF THE 4 types of poker players of the ULatvia Paper (we want to be Tight Aggressive)
- Problem 1: I am running into the problem of illegal moves: https://www.reddit.com/r/reinforcementlearning/comments/dgr2a2/illegal_moves_with_dqns_and_tic_tac_toe/
- Solution: Make the illegal moves very costly -> infinitely negative reward, so that it never chooses that. Also make the episode end when that is chosen.
- Alternative: Use masking, but it is hard to do that in
- Problem 2: How do we train this on an opponent? We need to generate data ourselves. There are also different types of opponents
- Solution: Self-play.
- Look at DeepStack
- Problem 3: Poker has a VERY VERY Large state space, will be very very hard to converge
- Solution (Potential): Card Abstraction + Action Abstraction
Train the poker on my strategy, which will need to use reinforcement learning.
1. The Sandbox
I would like to build a sandbox / environment in which I can train my AI. I know that OpenAI already provides these kind of sandboxes so I would base it off that, because that is how an AI learns.
2. The AI
The AI part of the Poker is actually where it gets super interesting.
When you build an AI like for Chess, it is a perfect information game. But Poker is an imperfect information game.
Step 1 is Game Abstraction, to reduce the complexity of the game while maintaining the relationships.
We need to store the bet history as a matplotlib function…
Different ways to train:
- Have it play against artificial data (learn from past winning decisions) (idk if this is even reinforcement learning)
- Have it play against data generated by itself (self-play) -> DO THIS
- “Self-play is used by the AlphaZero program to improve its performance in the games of chess, shogi and go.”
- Have it play against other bots that already exist
- Have it play against real players
And there are so many state that one can be in, each different hand combination is a starting state.
The main difficulty that I have training it as a classical RL problem is that it is very sample inefficient. That means that even after I train it for a long time, it probably won’t know how to react for a specific scenario.
- Transfer Learning
- Counterfactual Regret Minimization
- What about Imitation Learning as a way to learn poker? Seems like a stupid idea because it is very exploitable
I will first think of my own ideas before going out there on the internet and looking at what has been done.
Poker can also use lies and deceit!! Wouldn’t that be cool, to have a poker bot lie?
GOAL: Finding the optimal X strategy to win as much money as possible. Poker is not about winning each pot, but about winning money.
Also, playing poker, you can either face 1 player or multiple players, so that has a factor.
2 approaches: We call these different strategies policies. So we can test out various policies.
- Hard-code rules.
- We can create our own different rules
I want to build the sandbox. Then, some of this logic can be generalized to other games like Uno.
Read some of the papers published.