Poker AI

I am going to gather all of my research here, not on Notion, because I can directly make links here.

Git Repositories

TODO optimizations:



  • Really good tutorial by a guy who played 10+ years online poker
  • Poker Mathematics Book

Paper links

Other ideas

Poker Equity:’Equity,at%20that%20moment%20is%20%2490.

Other Links (Web Pages + Videos)

Poker Agents Papers

  • Slumbot “250,000 core hours and 2 TB of RAM to compute its strategy”
  • Polaris (2008)
  • Baby Tartanian 8 (2016) “2 million core hours and 18 TB of RAM to compute its strategy”
  • DeepStack (2017)
  • Libratus (2017)
    1. Blueprint Strategy (Full-Game Strategy) using MCCFR
    2. Subgame Solving with CFR+
    3. Adapt to opponent
  • Pluribus, video here (2019)

Some worries that I need to Sanity Check:

  • The Poker bot knows to underrepresent its hand, and overrepresent its hand (bluff)

I personally think this code is pretty badly written. When you write code, you should write it such that the you that forgets about it and works on it again in 6 months should not have difficulty going through it.

The Plan:

The Pipeline: Original Game ( (Automated Abstraction) Abstracted Game (Custom Equilibrium Finding Algorithm CFR) Nash Equilibrium (Reverse Model) Nash Equilibrium

Some Lessons

  • Real-time planning is super important. Without that you cannot do much. In Poker Libratus, it was subgame solving.
  • How to evaluate performance in Poker? Because unlike Chess which can have an objective Elo System, Poker is a very stochastic game with a lot of luck involved. Which makes it very interesting. Doing abstractions taught me about the game of poker

A common example of a drawing hand is one in which the player has four cards of the same suit.At the present stage the hand is not very strong, but could become so if a card of the same suit showed up later in the game. Since the strength of such a hand could potentially turn out to be much different later in the game, it is generally accepted among poker experts that such a hand should be played differently than another hand with a similar chance of winning, but without as much potential.

If using the difference between probabilities of winning as the clustering metric, the abstraction algorithm would consider these two very different situations similar.

I need to implement depth-limited search to solve it on my computer, as well as make use of abstraction for the original blueprint strategy.


Some Calculations about how to solve Poker

TODO: Use counting and odds to put opponents on certain probabilities?

Total Possible Information Sets (Cards Only) These are for each stage of the game

Total Possible Information Sets (Actions Only), where we consider that the cards we are dealt with are fixed: See the paper, but in total total we have information sets

Poker Hand Evaluation

I am going to work on this project, I have a feeling it is going to be super interesting.


The difficulty in making a poker AI is that it is an imperfect information game. In games of imperfect information, the value of your states depends on the strategy your opponent uses.

This is never the case in perfect information games, such as chess, since no matter your probability for each actions, there is an objective best move that you can calculate by going down the game tree.

Let’s think about a game of RPS, which is also a game of imperfect information: What would be the optimal strategy to beat the opponent? This is not such a simple question, because it depends on what the opponent plays.

If your opponent always plays rock, your optimal strategy is to always play paper.

Thus, the “best” strategy depends on the strategy of your opponent. In a game of chess, this is not the case, you can just iterate through over all possibilities: just draw a tree and look at all the possible moves the opponent can make, and see the payout (win / loss / draw). You want to choose the move such that in all possibilities it doesn’t lead to a loss (or guarantees a win).

In Imperfect Information games like Poker, you can’t do that since you don’t know exactly which node you are at (i.e. what cards your opponent has). Two different scenarios are not differentiable from your perspective (technical way to say this is that an Information Set contains multiple nodes).

The solution is to play at the Nash Equilibrium, which can be done using an algorithm called Nash Equilibrium, which can be done using an algorithm called Counterfactual Regret Minimization.

I think I am going to use ReBeL implementation.

And so this means I can just do it on my local computer.

Starting stack: 100BB

Other Explanation “RL can’t be directly applied to Poker because of uncertainty. Poker game state can’t be calculated, and we can’t define the value function.”

  • Most modern algorithms use this idea of a value function.

Also, a lot of the modern algorithms use observations that can represent the full mdp, its much harder for POMDP.

  • But we can still do that with VFA in a model-free environment, it will just be very susceptible to being exploited (and have a bad policy). Which is why we need to play at nash equilibrium
  1. Generate a random probability distribution for the three values.
  2. Train using RL, and see how that performs, because I don’t think it matters right?
  3. And if we use a particular strategy, can the opponent adapt? That’s where we need Nash Equilibrium

My opponent can adjust to my playing style. TODO: SHOW THE MATRIX OF THE 4 types of poker players of the ULatvia Paper (we want to be Tight Aggressive)

  • Problem 1: I am running into the problem of illegal moves:
    • Solution: Make the illegal moves very costly infinitely negative reward, so that it never chooses that. Also make the episode end when that is chosen.
    • Alternative: Use masking, but it is hard to do that in OpenAI Gym
  • Problem 2: How do we train this on an opponent? We need to generate data ourselves. There are also different types of opponents
    • Solution: Self-play.
    • Look at DeepStack
  • Problem 3: Poker has a VERY VERY Large state space, will be very very hard to converge
    • Solution (Potential): Card Abstraction + Action Abstraction


Train the poker on my strategy, which will need to use reinforcement learning.

1. The Sandbox

I would like to build a sandbox / environment in which I can train my AI. I know that OpenAI already provides these kind of sandboxes so I would base it off that, because that is how an AI learns.

2. The AI

The AI part of the Poker is actually where it gets super interesting.

When you build an AI like for Chess, it is a perfect information game. But Poker is an imperfect information game.

Step 1 is Game Abstraction, to reduce the complexity of the game while maintaining the relationships.

We need to store the bet history as a matplotlib function…

Different ways to train:

  • Have it play against artificial data (learn from past winning decisions) (idk if this is even reinforcement learning)
  • Have it play against data generated by itself (self-play) DO THIS
    • “Self-play is used by the AlphaZero program to improve its performance in the games of chess, shogi and go.”
  • Have it play against other bots that already exist
  • Have it play against real players

And there are so many state that one can be in, each different hand combination is a starting state.

The main difficulty that I have training it as a classical RL problem is that it is very sample inefficient. That means that even after I train it for a long time, it probably won’t know how to react for a specific scenario.



Brainstorming Phase

I will first think of my own ideas before going out there on the internet and looking at what has been done.

Poker can also use lies and deceit!! Wouldn’t that be cool, to have a poker bot lie?

GOAL: Finding the optimal X strategy to win as much money as possible. Poker is not about winning each pot, but about winning money.

Also, playing poker, you can either face 1 player or multiple players, so that has a factor.

2 approaches: We call these different strategies policies. So we can test out various policies.

  1. Hard-code rules.
  2. We can create our own different rules

I want to build the sandbox. Then, some of this logic can be generalized to other games like Uno.

Read some of the papers published.