Poker AI

I am going to gather all of my research here, not on Notion, because I can directly make links here.

Git Repositories

Recently open-sourced solution, though it seems that it doesn’t include the code for abstractions, nor depth-limited solving https://github.com/ai-decision/decisionholdem
https://github.com/matthewkennedy5/Poker → Really good writing
https://github.com/fedden/poker_ai
https://github.com/jneckar/skybet
https://github.com/zanussbaum/pluribus (An attempt at implementing Pluribus)
https://github.com/doas3140/PyStack (Python Implementation of DeepStack)
These students tried to make a copy of Libratus: https://github.com/michalp21/coms4995-finalproj https://github.com/tansey/pycfr (8 years old) → implementation in CFR, not support nolimit texas holdem
Pokerbot https://github.com/dickreuter/Poker
Gym Environment https://github.com/dickreuter/neuron_poker

TODO optimizations:

Use treys https://github.com/ihendley/treys for hand evaluation because it’s likely faster than my own :(
https://github.com/gorel/C-Poker-AI (rand() is non-thread safe??)
https://github.com/datamllab/rlcard
- This seems very promising, and they have an online link https://rlcard.org/

Blogs

Other:

Really good tutorial by a guy who played 10+ years online poker
Poker Mathematics Book

Paper links

Computer poker: A review
An Introduction to CFR (Neller, 2013) ESSENTIAL
Vanilla CFR
- (CFR first introduced) Regret Minimization in Games with Incomplete Information (Bowling, 2007)
- Using CFR to Create Competitive Multiplayer Poker Agents (Risk, 2010)
Monte-Carlo CFR (IMPORTANT)
- Monte Carlo Sampling for Regret Minimization in Extensive Games
- Efficient MCCFR in Games with Many Player Actions (Burch, 2012)
CFR-BR (CFR-Best Response)
- Finding Optimal Abstract Strategies in Extensive-Form Games (Burch, 2012) (IMPORTANT paper in finding)
CFR-D (Decomposition)
- Solving Imperfect Information Games Using Decomposition (Burch, 2013)
CFR+
- (Pseudocode) Solving Large Imperfect Information Games Using CFR+ (Tammelin, 2014)
- Solving Heads-up Limit Texas Hold’em (Tammelin, 2015)
RBP Regret-Based Pruning
- RBP is particularly useful in large games where many actions are suboptimal, but where it is not known beforehand which actions those are
- Regret-Based Pruning in Extensive-Form Games (Brown, 2015)
Warm Start CFR
- Strategy-Based Warm Starting for Regret Minimization in Games (Brown, 2015)
DCFR (Discounted CFR)
- Solving Imperfect-Information Games via Discounted Regret Minimization (Brown, 2018)
ICFR (instant CFR)
- Efficient CFR for Imperfect Information Games with Instant Updates (Li, 2019)
Deep CFR
- Deep Counterfactual Regret Minimization (Brown, 2018)
- Combining Deep Reinforcement Learning and Search for Imperfect-Information Games (Brown, 2020)

Other ideas

Depth-Limited Solving (IMPORTANT): This is a key technique that allows us to train a top tier Poker AI on our local computer, by improving a blueprint strategy.
- Depth-Limited Solving for Imperfect-Information Games (Brown, 2018)
Abstractions (IMPORTANT): See Game Abstraction. Abstractions are absolutely necessary, since Texas Hold’Em is too big to solve directly
Exploitability / Best Response
- Accelerating Best Response Calculation in Large Extensive Games (Johanson, 2011)
Subgame Solving: This seems to be impossible to do on a local computer
- Safe and Nested Subgame Solving for Imperfect-Information Games (Brown, 2017)
Measuring the Size of Poker
- Measuring the Size of Large No-Limit Poker Games (Johnson, 2013)
Evaluating the Performance of a Poker Agent
- A TOOL FOR THE DIRECT ASSESSMENT OF POKER DECISIONS (Billings, 2006)
- Strategy Evaluation in Extensive Games with Importance Sampling (Bowling, 2008)

Poker Equity: https://www.pokernews.com/strategy/talking-poker-equity-21291.htm#:~:text=When%20you%20play%20poker%2C%20’Equity,at%20that%20moment%20is%20%2490.

Other Links (Web Pages + Videos)

https://poker.cs.ualberta.ca/resources.html, this is really good https://poker.cs.ualberta.ca/general_information.html for general information
Poker Database: https://poker.cs.ualberta.ca/irc_poker_database.html
The State of Techniques for Solving Large Imperfect-Information Games, Including Poker by Sandholm, really solid overview about abstractions of the game
Superhuman AI for heads-up no-limit poker: Libratus beats top professionals by Noam Brown
- AI for Imperfect-Information Games: Beating Top Humans in No-Limit Poker by Noam Brown at Microsoft Research

Poker Agents Papers

Slumbot “250,000 core hours and 2 TB of RAM to compute its strategy”
Polaris (2008)
Baby Tartanian 8 (2016) “2 million core hours and 18 TB of RAM to compute its strategy”
DeepStack (2017)
Libratus (2017)
1. Blueprint Strategy (Full-Game Strategy) using MCCFR
2. Subgame Solving with CFR+
3. Adapt to opponent
Pluribus, video here (2019)

Some worries that I need to Sanity Check:

The Poker bot knows to underrepresent its hand, and overrepresent its hand (bluff)

I personally think this code is pretty badly written. When you write code, you should write it such that the you that forgets about it and works on it again in 6 months should not have difficulty going through it.

The Plan:

Implement Abstraction from here: https://github.com/matthewkennedy5/Poker, no here: https://github.com/Jingho/Deepstack-for-Bucket
Implement depth-limited solving from here https://github.com/zanussbaum/pluribus/tree/master/leduc
- Actually, just look at the original paper. They also have details on how they constructed to superhuman AI agent
- https://github.dev/zanussbaum/pluribus/blob/master/leduc/search.py

The Pipeline: Original Game ( $1 0^{161}$ $\to$ (Automated Abstraction) $\to$ Abstracted Game $\to$ (Custom Equilibrium Finding Algorithm CFR) $\to$ Nash Equilibrium $\to$ (Reverse Model) $\to$ Nash Equilibrium

Some Lessons

Real-time planning is super important. Without that you cannot do much. In Poker Libratus, it was subgame solving.

How to evaluate performance in Poker? Because unlike Chess which can have an objective Elo System, Poker is a very stochastic game with a lot of luck involved. Which makes it very interesting. Doing abstractions taught me about the game of poker

A common example of a drawing hand is one in which the player has four cards of the same suit.At the present stage the hand is not very strong, but could become so if a card of the same suit showed up later in the game. Since the strength of such a hand could potentially turn out to be much different later in the game, it is generally accepted among poker experts that such a hand should be played differently than another hand with a similar chance of winning, but without as much potential.

If using the difference between probabilities of winning as the clustering metric, the abstraction algorithm would consider these two very different situations similar.

I need to implement depth-limited search to solve it on my computer, as well as make use of abstraction for the original blueprint strategy.

Concepts:

Game Abstraction

Some Calculations about how to solve Poker

TODO: Use counting and odds to put opponents on certain probabilities?

Total Possible Information Sets (Cards Only) These are for each stage of the game $(2 52) + (2 52) (3 50) + (2 52) (3 50) (1 47) + (2 52) (3 50) (1 47) (1 46)$

Total Possible Information Sets (Actions Only), where we consider that the cards we are dealt with are fixed: See the paper, but in total total we have $3.19 \times 1 0^{14}$ information sets

https://stackoverflow.com/questions/59435354/how-can-i-speed-up-my-python-poker-hand-vs-hand-equity-calculator

Poker Hand Evaluation

I am going to work on this project, I have a feeling it is going to be super interesting.

Notes

The difficulty in making a poker AI is that it is an imperfect information game. In games of imperfect information, the value of your states depends on the strategy your opponent uses.

This is never the case in perfect information games, such as chess, since no matter your probability for each actions, there is an objective best move that you can calculate by going down the game tree.

Let’s think about a game of RPS, which is also a game of imperfect information: What would be the optimal strategy to beat the opponent? This is not such a simple question, because it depends on what the opponent plays.

If your opponent always plays rock, your optimal strategy is to always play paper. https://www.quantamagazine.org/the-game-theory-math-behind-rock-paper-scissors-20180402/

Thus, the “best” strategy depends on the strategy of your opponent. In a game of chess, this is not the case, you can just iterate through over all possibilities: just draw a tree and look at all the possible moves the opponent can make, and see the payout (win / loss / draw). You want to choose the move such that in all possibilities it doesn’t lead to a loss (or guarantees a win).

In Imperfect Information games like Poker, you can’t do that since you don’t know exactly which node you are at (i.e. what cards your opponent has). Two different scenarios are not differentiable from your perspective (technical way to say this is that an Information Set contains multiple nodes).

The solution is to play at the Nash Equilibrium, which can be done using an algorithm called Nash Equilibrium, which can be done using an algorithm called Counterfactual Regret Minimization.

I think I am going to use ReBeL implementation.

And so this means I can just do it on my local computer.

Starting stack: 100BB https://wizardofodds.com/games/texas-hold-em/

Other Explanation “RL can’t be directly applied to Poker because of uncertainty. Poker game state can’t be calculated, and we can’t define the value function.”

Most modern algorithms use this idea of a value function.

Also, a lot of the modern algorithms use observations that can represent the full mdp, its much harder for POMDP.

But we can still do that with VFA in a model-free environment, it will just be very susceptible to being exploited (and have a bad policy). Which is why we need to play at nash equilibrium

Generate a random probability distribution for the three values.
Train using RL, and see how that performs, because I don’t think it matters right?
And if we use a particular strategy, can the opponent adapt? That’s where we need Nash Equilibrium

My opponent can adjust to my playing style. TODO: SHOW THE MATRIX OF THE 4 types of poker players of the ULatvia Paper (we want to be Tight Aggressive)

Problem 1: I am running into the problem of illegal moves: https://www.reddit.com/r/reinforcementlearning/comments/dgr2a2/illegal_moves_with_dqns_and_tic_tac_toe/
- Solution: Make the illegal moves very costly → infinitely negative reward, so that it never chooses that. Also make the episode end when that is chosen.
- Alternative: Use masking, but it is hard to do that in OpenAI Gym
Problem 2: How do we train this on an opponent? We need to generate data ourselves. There are also different types of opponents
- Solution: Self-play.
- Look at DeepStack
Problem 3: Poker has a VERY VERY Large state space, will be very very hard to converge
- Solution (Potential): Card Abstraction + Action Abstraction

Outline

Train the poker on my strategy, which will need to use reinforcement learning.

1. The Sandbox

I would like to build a sandbox / environment in which I can train my AI. I know that OpenAI already provides these kind of sandboxes so I would base it off that, because that is how an AI learns.

2. The AI

The AI part of the Poker is actually where it gets super interesting.

When you build an AI like for Chess, it is a perfect information game. But Poker is an imperfect information game.

Step 1 is Game Abstraction, to reduce the complexity of the game while maintaining the relationships.

We need to store the bet history as a matplotlib function…

Different ways to train:

Have it play against artificial data (learn from past winning decisions) (idk if this is even reinforcement learning)
Have it play against data generated by itself (self-play) → DO THIS
- “Self-play is used by the AlphaZero program to improve its performance in the games of chess, shogi and go.”
Have it play against other bots that already exist
Have it play against real players

And there are so many state that one can be in, each different hand combination is a starting state.

The main difficulty that I have training it as a classical RL problem is that it is very sample inefficient. That means that even after I train it for a long time, it probably won’t know how to react for a specific scenario.

Concepts

Transfer Learning
Counterfactual Regret Minimization
What about Imitation Learning as a way to learn poker? Seems like a stupid idea because it is very exploitable

Research

Brainstorming Phase

I will first think of my own ideas before going out there on the internet and looking at what has been done.

Poker can also use lies and deceit!! Wouldn’t that be cool, to have a poker bot lie?

GOAL: Finding the optimal X strategy to win as much money as possible. Poker is not about winning each pot, but about winning money.

Also, playing poker, you can either face 1 player or multiple players, so that has a factor.

2 approaches: We call these different strategies policies. So we can test out various policies.

Hard-code rules.
We can create our own different rules

I want to build the sandbox. Then, some of this logic can be generalized to other games like Uno.

Read some of the papers published.

🛠️ Steven Gong

Table of Contents