twitter

Monday, May 27, 2019

Monte Carlo Simulation of BlackJack Hands

Monte Carlo Simulation of BlackJack Hands
There is a lot of literature, and practical expertise and advice, available for the game of BlackJack. I wanted to understand this game, and learn about Monte Carlo simulations in python. So, I set out to create/reproduce strategy tables for the game that can be created in near real time for any shoe. What follows is my understanding of the game, and my analysis. Python code is also provided.

The outcome of any one round in blackjack depends on

  • the current shoe (the cards that are left in it)
  • the dealer's face card
  • the player's hand

The player's decision points are dependent on the composition of the shoe. The player has to decide:

  1. How much to bet before the round begins (a function of expected returns)
  2. What stay threshold to use (i.e., when to stop requesting more cards) There are others, such as split/double/insurance, but I have ignored these for now.

In this NB, I wrote a basic BJ simulator and tested the relation between the current deck, dealer's face card, the player's stay threshold and the expected returns. Plots the result using seaborn heatmap. The simulator does not impliment doubling, spliting or insurance (all of which could increase the player's odds if done right)

As an aside, I don't know much about numba. I just added @jit, and magically, the code speed up achieved was 100x; INCREDIBLY fast; so fast it's almost VOODOO.

The code for the simulations is presented first. The analysis follows.

In [2]:
import numpy as np
import pandas as pd
from numba import jit
%matplotlib inline
import seaborn as sns; sns.set()
from matplotlib import pyplot as plt

The code has the following functions:

  • new_deck - makes a deck of 6 packs. All Aces count as 11.
  • payout - calculates who won, and how much
  • player - pops cards from the show until the sum of the cards is above the requested threshold
  • deal_hand - deals 2 cards for the player and 1 for the dealer
  • play_out_round - finishes the round, giving cards to the player until the sum of the cards crosses the requested threshold; then the dealer with a threshold of 17. Then determines payout.
  • tabulate and tabulate_threshold are helper functions to aggregate the results
In [3]:
@jit
def new_deck(takeout=[]):
    '''make a 6 pack deck, removing cards from the takeout list
    the takeout list is used to deliberately bias the deck if needed'''
    cards = np.repeat([2,3,4,5,6,7,8,9,10,10,10,10,11], 4 * 6)
    for i in takeout:
        cards = cards[cards != i]
    np.random.shuffle(cards)
    return cards

@jit
def payout(p_sum, d_sum):
    '''takes player's card sum and dealers card sum
    returns the player's payout'''
    if p_sum == 21:
        return 1.5 # 3:2 payment if blackjack
    elif p_sum == 21 and d_sum == 21:
        return 0
    elif p_sum > 21:
        return -1
    elif d_sum > 21:
        return 1
    elif p_sum > d_sum:
        return 1
    elif p_sum == d_sum:
        return 0
    else:
        return -1

@jit
def player(cards, stay_threshold = 13, current_cards = None):
    '''Takes a list of cards, a threshold above which to stay and the sum of current cards
    returns the cards sum after the dealing'''
    count = current_cards
    while count < stay_threshold:
        count += cards[0]
        cards = cards[1:]
    return cards, count

@jit
def deal_hand(deck):
    '''Deals player's 2 cards and the dealer's one
    Returns these and the deck'''
    player_2, deck = deck[0] + deck[1], deck[2:]
    dealer_1, deck = deck[2], deck[3:]
    return deck, player_2, dealer_1

@jit
def play_out_round(deck, p_sum, d_sum, threshold, hit = False):
    deck, p_sum = player(deck, stay_threshold = threshold, current_cards = p_sum)
    deck, d_sum = player(deck, stay_threshold=17, current_cards = d_sum)
    if hit:
        return payout(p_sum + deck[0], d_sum), p_sum, d_sum, len(deck)-1       
    else:
        return payout(p_sum, d_sum), p_sum, d_sum, len(deck)        

@jit()
def tabulate(table):
    ''' takes what simulate returns and aggregates'''
    # Convert the results to pandas DF, pivot and aggregate.
    df = pd.DataFrame(table)
    df.columns = ['DealerHand', 'PlayerHand', 'Payoff', 'Threshold', "Cards Left In Deck"]
    pvt = pd.pivot_table(df, index='DealerHand', columns='PlayerHand',
                   values='Payoff', aggfunc='mean', fill_value=0)
    return pvt.loc[:,:21]

@jit()
def tabulate_threshold(table):
    ''' takes what simulate returns and aggregates'''
    # Convert the results to pandas DF, pivot and aggregate.
    df = pd.DataFrame(table)
    df.columns = ['DealerHand', 'PlayerHand', 'Payoff', 'Threshold', "Cards Left In Deck"]
    pvt = pd.pivot_table(df, index='DealerHand', columns='Threshold',
                   values='Payoff', aggfunc='mean', fill_value=0)
                   #values='Payoff', aggfunc='std', fill_value=0)
    return pvt.loc[:,:21]

The following functions simulate 1000000 single rounds using the given shoe.

  • simulate_stay simulates the results of staying when the threshold is crossed
  • simulate_hit does one more hit
In [4]:
# Simulate 1000000 rounds from the same deck of cards
@jit(nopython = True) 
def simulate_stay(passed_deck):
    ''' take a deck, simulate 1 round many times with staying at random values
    return the results'''
    table = []
    for _ in range(1000000):
        deck = np.copy(passed_deck)
        np.random.shuffle(deck)
        rand_threshold = np.random.randint(5, 22)
        deck, p, de = deal_hand(deck)
        pay, p_sum, d_sum, test = play_out_round(deck, p, de, rand_threshold)
        #table.append([de, rand_threshold, pay])
        table.append([de, p_sum, pay, rand_threshold, test])
    return table

@jit(nopython = True) 
def simulate_hit(passed_deck):
    ''' Given a hand for the player and the dealer, should I hit or stay?
    This shows what will happen if you hit (the previous function simulates staying)'''
    table = []
    for _ in range(1000000):
        deck = np.copy(passed_deck)
        np.random.shuffle(deck)
        rand_threshold = np.random.randint(5, 22)
        deck, p, de = deal_hand(deck)
        pay, p_sum, d_sum, test = play_out_round(deck, p, de, rand_threshold, hit=True)
        table.append([de, p_sum, pay, rand_threshold, test])
    return table

The final result is a table of expected returns (from -1.0 to +1.5) for every dealer hand and every threshold. Plotting the table as a heatmap is useful to identify trends.

I should explain what I mean by threshold - this is the count that the player chooses at or above which to stand. For e.g., if the dealer has 7 and the player chooses a threshold of 15, once the player's cards sum to 15 or greater, he/she will stand.

In [ ]:
 

Simulation of expected return for various stay thresholds in a full shoe (6 decks)

In [9]:
deck = new_deck(takeout=[])
simulation = simulate_stay(deck)
table = tabulate_threshold(simulation)
sns.heatmap(table)
table
Out[9]:
Threshold 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0 15.0 16.0 17.0 18.0 19.0 20.0 21.0
DealerHand
2.0 0.063176 0.060476 0.050734 0.038131 0.087706 0.102330 0.115663 0.155658 0.150054 0.137494 0.106211 0.033453 0.023417 -0.032935 -0.157519 -0.325340 -0.624136
3.0 0.112817 0.091524 0.087023 0.086474 0.098039 0.099599 0.178635 0.156534 0.184001 0.117815 0.136902 0.099824 0.056179 -0.047798 -0.172914 -0.337212 -0.632809
4.0 0.134016 0.115969 0.113323 0.130031 0.142112 0.125528 0.186644 0.199439 0.198240 0.162505 0.110344 0.118851 0.045926 -0.027453 -0.132927 -0.315567 -0.646081
5.0 0.164122 0.155543 0.171438 0.174202 0.160516 0.182573 0.219790 0.242316 0.217712 0.197029 0.183021 0.112570 0.087654 -0.027128 -0.121743 -0.311527 -0.646924
6.0 0.152935 0.160390 0.138491 0.129592 0.165126 0.164367 0.194306 0.251487 0.203556 0.213866 0.174100 0.152732 0.079533 0.002099 -0.119978 -0.324462 -0.648733
7.0 -0.068621 -0.074625 -0.059757 -0.077747 -0.044674 -0.011740 0.023920 0.086845 0.070122 0.101823 0.089062 0.084215 0.106395 0.060780 -0.111075 -0.292912 -0.651046
8.0 -0.106260 -0.133828 -0.114590 -0.148858 -0.142763 -0.112500 -0.050291 -0.012439 0.009866 0.023074 0.023746 0.048156 0.058481 0.022752 -0.087489 -0.280596 -0.640837
9.0 -0.215054 -0.199479 -0.181219 -0.183559 -0.166161 -0.188720 -0.123985 -0.076115 -0.100068 -0.084974 -0.061785 -0.068954 -0.043197 -0.085430 -0.139656 -0.318012 -0.621935
10.0 -0.291139 -0.288300 -0.282475 -0.281446 -0.276915 -0.257782 -0.240855 -0.198129 -0.188820 -0.178635 -0.172357 -0.159963 -0.166159 -0.185491 -0.251616 -0.357464 -0.656097
11.0 -0.229097 -0.244160 -0.255727 -0.276720 -0.251239 -0.239982 -0.211331 -0.178093 -0.170068 -0.184065 -0.171245 -0.123918 -0.174304 -0.203829 -0.286237 -0.431411 -0.637728

The table above shows the result of 1 million simulations. An example interpretation: if the dealer has 6, and the player sets her threshold to 12 (i.e., stands at or above 12), her expected return is \$0.242316 for every \$1.00 invested. If the dealer has 9 or above, the player has no threshold that has a positive expected value.

In [ ]:
 

The best threshold for each dealer hand

If the dealer has 2, the most profitable threshold is 12 (i.e., stay at or above 12). Dealer has 8 = the most profitable threshold is at 17.

In [17]:
df=pd.concat([table.idxmax(axis=1), table.max(axis=1)], axis=1)
df.columns = ['Optimal Threshold', 'Expected Return']
df_optimal = df['Optimal Threshold']
df_optimal
Out[17]:
DealerHand
2.0     12.0
3.0     13.0
4.0     12.0
5.0     12.0
6.0     12.0
7.0     17.0
8.0     17.0
9.0     17.0
10.0    16.0
11.0    16.0
Name: Optimal Threshold, dtype: float64
In [ ]:
 

The mean and standard deviation of expected returns

If a player plays a 100 hands, the expected return is \$2.377 with a standard deviation of \$10.28

In [22]:
# Simulate 1 million trials
t = simulate_stay(new_deck(takeout=[]))
df = pd.DataFrame(t)
df.columns = ['DealerHand', 'PlayerHand', 'Payoff', 'Threshold', 'Test']

# Only choose optimal thresholds (optimal defined using previous simulations)
l = []
for k, v in df_optimal.to_dict().items():
    l.append(df[((df.DealerHand==k) & (df.Threshold==v))])

optimal_subset = pd.concat(l)

# Randomly take a 100 results and determine cumsum(). Do this 10000 times. Describe.
# The idea is to see what happens if you play many sittings of a 100 rounds each
x = pd.Series([optimal_subset.sample(100).reset_index().Payoff.cumsum().iloc[-1] for _ in range(10000)])
x.hist(bins=50)
x.describe()
Out[22]:
count    10000.000000
mean         2.377000
std         10.281758
min        -33.500000
25%         -4.500000
50%          2.500000
75%          9.500000
max         46.000000
dtype: float64
In [ ]:
 

The results are very optimistic. I am convinced that the code is correct. I wonder if the rules as I have implimented them are incomplete/incorrect. I hope this is useful to someone. I welcome comment/suggestions for improvement.

In [ ]: