Game Theory

Game Abstraction

I encountered this as I was working on the Poker AI. Really good paper that explains how abstractions work:

There are two types of Abstractions:

  1. Information Abstraction (ex: Card Abstraction)
  2. Action Abstraction

Imperfect Recall

  • Imperfect recall of actions allows us to trade off fine distinction of judgment for computational space and time requirements.

  • Action translation

  • Card abstraction. Treat certain hands identically. However, it’s a little harder to bluff,

Lossless Abstraction

  • You don’t lose information.

Ace of hearts, well you don’t care if it is a heart.

Canonical has 169 starting hands instead of .

You get an algorithm that solves this.

Two Abstractions

Imperfect Recall

Card Abstraction

Taking notes from this paper: https://www.cs.cmu.edu/~sandholm/potential-aware_imperfect-recall.aaai14.pdf

Instead of manually clustering the hands based on our own definitions, it is better to cluster them based on their objective strength.

In Poker, the measure of a hand strength is given in terms of the equity (probability of winning plus 1/2 of the probability of tying) against a uniform random draw of private cards for the opponent, assuming a uniform random rollout of the remaining public cards.

This is known as the expected hand strength. Okay, I implement code to check that. https://www.datacamp.com/tutorial/python-probability-tutorial

I can just use a Monte-Carlo methods. Every time I win +1, Every time lose, 0, and tie is +0.5. In the end, divide by the total number of trials.

Given enough random samples , over time, my returns are going to converge to the expected hand strength. This is exactly the same idea for Monte-Carlo Learning in Monte-Carlo Learning in Reinforcement Learning, Monte-Carlo Learning in Reinforcement Learning, Reinforcement Learning, Serendipity.

  • Formally, as , , where

https://www.cardplayer.com/poker-tools/odds-calculator/texas-holdem

However, EHS doesn’t paint the entire picture. We need to look at the distribution of our hand strength against various opponents. Two pair of cards with EHS can have drastically different distributions.

  • It means that with certain community cards (like imagine the flop), a pair of cards either does really bad or really good. Which makes sense. When you have a KQ, it only does well if the flop shows a king or a queen (so you get a pair). Else, it’s worthless

So rather than looking at a single value of the EHS, we can look at a distribution of the EHS over various stages of the game. For example, if we want to generate a distribution for the EHS of the turn (so we are given our private cards + 3 community cards), we draw various turn cards, and calculate the equity using those turn cards. If we find for a given card that its equity is 0.645, and we have bucketed at intervals of 0.10, we would increment the bucket 0.60-0.70 by one. We repeat this process until we get enough samples.

notes/Serendipity! -> this is very similar to the Serendipity! -> this is very similar to the notes/Multi-Armed Bandit modelling the reward, whether we use a single value as the average of the rewards (such as when we do Serendipity! -> this is very similar to the notes/Multi-Armed Bandit modelling the reward, whether we use a single value as the average of the rewards (such as when we do Multi-Armed Bandit modelling the reward, whether we use a single value as the average of the rewards (such as when we do notes/Epsilon-Greedy, or use distributions to model it like in Serendipity! -> this is very similar to the notes/Multi-Armed Bandit modelling the reward, whether we use a single value as the average of the rewards (such as when we do Multi-Armed Bandit modelling the reward, whether we use a single value as the average of the rewards (such as when we do notes/Epsilon-Greedy, or use distributions to model it like in Multi-Armed Bandit modelling the reward, whether we use a single value as the average of the rewards (such as when we do notes/Epsilon-Greedy, or use distributions to model it like in Epsilon-Greedy, or use distributions to model it like in notes/Thompson Sampling

We use the Earth Mover’s Distance

In the domain of Texas Hold’em poker,1 the leading abstraction algorithm works as follows (Johanson et al. 2013).

  1. Pre-flop: No card abstraction, each hand is in its own bucket.
  2. Flop and Turn rounds: abstractions are computed as follows:
    1. an equity histogram is constructed for each hand, similarly to those in Figures 1 and 2. For example, for the flop, we will create a histogram for the hand where the private cards are Kc3h and the public cards are KsTd8h. Then k-means is used to compute an abstraction with a desired number of clusters, using the EMD between each pair of histograms as the distance metric. One important feature of these abstractions is that they have imperfect recall: a player can be made to forget information that he knew earlier in the hand. For example, the hands Kc3h-KsTd8h and Kc4h-KsTd8h will likely be grouped together on the flop, even though the player could distinguish between Kc3h and Kc4h in the preflop round. Imperfect recall abstractions have been demonstrated to lead to significantly stronger performance than perfect recall for an abstraction of a given size, because they allow the player to have a more refined view of the present since he is allowed to forget details about the past (Waugh et al. 2009b).2 That algorithm computes abstractions for the flop and turn rounds independently using this approach. It computes the abstraction for the final round using a different approach (k-means with L2 over vectors of EHS against first-round clusters of the opponent).

Potential aware is being able to look at the future as well.

Action Abstraction

Group together similar bet sizes