Game Abstraction
I encountered this as I was working on the Poker AI. Really good paper that explains how abstractions work:
 Abstraction for Solving Large IncompleteInformation Games (Sandholm)
 Evaluating StateSpace Abstractions in ExtensiveForm Games
There are two types of Abstractions:
 Information Abstraction (ex: Card Abstraction)
 Action Abstraction
Imperfect Recall

Imperfect recall of actions allows us to trade off fine distinction of judgment for computational space and time requirements.

Action translation

Card abstraction. Treat certain hands identically. However, it’s a little harder to bluff,
Lossless Abstraction
 You don’t lose information.
Ace of hearts, well you don’t care if it is a heart.
Canonical has 169 starting hands instead of $(252 )$.
You get an algorithm that solves this.
Two Abstractions
Imperfect Recall
Card Abstraction
Taking notes from this paper: https://www.cs.cmu.edu/~sandholm/potentialaware_imperfectrecall.aaai14.pdf
Instead of manually clustering the hands based on our own definitions, it is better to cluster them based on their objective strength.
In Poker, the measure of a hand strength is given in terms of the equity (probability of winning plus 1/2 of the probability of tying) against a uniform random draw of private cards for the opponent, assuming a uniform random rollout of the remaining public cards.
This is known as the expected hand strength. Okay, I implement code to check that. https://www.datacamp.com/tutorial/pythonprobabilitytutorial
I can just use a MonteCarlo methods. Every time I win +1, Every time lose, 0, and tie is +0.5. In the end, divide by the total number of trials.
Given enough random samples $n$, over time, my returns are going to converge to the expected hand strength. This is exactly the same idea for MonteCarlo Learning in MonteCarlo Learning in Reinforcement Learning, MonteCarlo Learning in Reinforcement Learning, Reinforcement Learning, Serendipity.
 Formally, as $n→∞$, $G_{n}→EHS$, where $G_{n}=n1 i=1∑n R_{i}$
https://www.cardplayer.com/pokertools/oddscalculator/texasholdem
However, EHS doesn’t paint the entire picture. We need to look at the distribution of our hand strength against various opponents. Two pair of cards with EHS can have drastically different distributions.
 It means that with certain community cards (like imagine the flop), a pair of cards either does really bad or really good. Which makes sense. When you have a KQ, it only does well if the flop shows a king or a queen (so you get a pair). Else, it’s worthless
So rather than looking at a single value of the EHS, we can look at a distribution of the EHS over various stages of the game. For example, if we want to generate a distribution for the EHS of the turn (so we are given our private cards + 3 community cards), we draw various turn cards, and calculate the equity using those turn cards. If we find for a given card that its equity is 0.645, and we have bucketed at intervals of 0.10, we would increment the bucket 0.600.70 by one. We repeat this process until we get enough samples.
notes/Serendipity! > this is very similar to the Serendipity! > this is very similar to the notes/MultiArmed Bandit modelling the reward, whether we use a single value as the average of the rewards (such as when we do Serendipity! > this is very similar to the notes/MultiArmed Bandit modelling the reward, whether we use a single value as the average of the rewards (such as when we do MultiArmed Bandit modelling the reward, whether we use a single value as the average of the rewards (such as when we do notes/EpsilonGreedy, or use distributions to model it like in Serendipity! > this is very similar to the notes/MultiArmed Bandit modelling the reward, whether we use a single value as the average of the rewards (such as when we do MultiArmed Bandit modelling the reward, whether we use a single value as the average of the rewards (such as when we do notes/EpsilonGreedy, or use distributions to model it like in MultiArmed Bandit modelling the reward, whether we use a single value as the average of the rewards (such as when we do notes/EpsilonGreedy, or use distributions to model it like in EpsilonGreedy, or use distributions to model it like in notes/Thompson Sampling
We use the Earth Mover’s Distance
In the domain of Texas Hold’em poker,1 the leading abstraction algorithm works as follows (Johanson et al. 2013).
 Preflop: No card abstraction, each hand is in its own bucket.
 Flop and Turn rounds: abstractions are computed as follows:
 an equity histogram is constructed for each hand, similarly to those in Figures 1 and 2. For example, for the flop, we will create a histogram for the hand where the private cards are Kc3h and the public cards are KsTd8h. Then kmeans is used to compute an abstraction with a desired number of clusters, using the EMD between each pair of histograms as the distance metric. One important feature of these abstractions is that they have imperfect recall: a player can be made to forget information that he knew earlier in the hand. For example, the hands Kc3hKsTd8h and Kc4hKsTd8h will likely be grouped together on the flop, even though the player could distinguish between Kc3h and Kc4h in the preflop round. Imperfect recall abstractions have been demonstrated to lead to significantly stronger performance than perfect recall for an abstraction of a given size, because they allow the player to have a more refined view of the present since he is allowed to forget details about the past (Waugh et al. 2009b).2 That algorithm computes abstractions for the flop and turn rounds independently using this approach. It computes the abstraction for the final round using a different approach (kmeans with L2 over vectors of EHS against firstround clusters of the opponent).
Potential aware is being able to look at the future as well.
Action Abstraction
Group together similar bet sizes