I've been playing a lot of chess recently, and it has gotten me interested in how the game should optimally be played. In college I took an intro to AI course, and one of the first things that we covered was an algorithm called minimax, which allows you to discover the optimal play for any two player, fully determined zero-sum game. This post will be the first in a series dedicated to understanding how to solve games such as this. Later on we will dive into Monte Carlo Tree Search (MCTS), reinforcement learning and the use of neural networks to determine policies. For now, let's start with minimax.
It is important to stay consistent with terminology, so let's define exactly what we mean by some common terms:
- state: A unique possible configuration of the system in question. For example, in the case of chess this would be the positions of each piece on the board. This could be represented in a variety of ways, for example a 2d array.
- state space: The set of all possible states of a system.
- action: Any move that transitions a system from one state to the next. In chess this would be a valid move resulting in a piece leaving a position
(a, b)and arriving at the position
- heuristic function: A function that takes in the current state and player as input and returns the favorability of that state to the current player.
Later blog posts will build on this set of definitions as we dive into more complex topics.
The basic idea of minimax is that given the fact that the state space is fully explorable, we can simply play out every possible game, and find out who wins which game. Then we propagate that information back up the constructed tree, choosing the action that led to the most favorable outcome for the current player.
More concretely, we construct a tree where each node is a state and each directed edge represents an action
s -> s'. We will use the game of tic-tac-toe for our examples, as it is simple enough to be easily understood but not so trivial that it would be pointless to run minimax. The root of the tree that we construct is just the empty board. The next layer of nodes is each valid move for the first player, "X". We can use rotational and translational symmetries of the board to exclude some states. Below is the first layer of the tree in question:
Now let's assume that each of these three states have a value associated with them, which represents how good the state is for the player currently at the root of the tree - "X". Obviously, we assume that "X" wants to win, which means maximize the "goodness" of the next state. We will lock down exactly how this value is calculated in the future.
Let's also make the reasonable assumption that the "O" player wants to win too. This means that they want to maximize the favorability of their state as well. Equivalently, we may say that they want to minimize the favorability of a state for player "X", because it is a two player game. Player "X" should assume that player "O" will play optimally, so that they pick the action that minimizes the value of all the states they can transition to. So player "X" wants to take the state that maximizes its probability of winning over the set of next states, which in turn is a minimum of player "O"'s probability of winning.
The algorithm goes something like this: construct a tree of all possible states of the game. Start at the leaf nodes of the tree, which occur when one of the players has won the game or there was a tie. We assign a value of 1 if the player whose turn it currently is won the game, -1 if they lost and 0 if the game ended in a tie. These values are then propagated back up the tree recursively, such that a node takes on the maximum value over all its children if it corresponds to a state where it is the current players turn, and the minimum value over its children if the state corresponds to the opposing players turn. In this way the root node should be the value corresponding to the favorability of the current state for the current player, assuming optimal play from both players. The optimal action is simply the action that leads to the optimal value, or the argmax over the child values.
To better understand this process, let's go through a visual example with the values being propagated up from the leaves. Below is the first step, where determine for any given row whether we are computing a minimum or maximum over the next states:
Once this is done, we calculate a min over these nodes to get the value of the nodes the next layer up:
Finally, we compute the minimax value of the root node. We choose to take the path that corresponds to that value - in this case the left path. Assuming that player "O" plays optimally, they will also choose the path outlined in green, which minimizes player "X"'s value. However, if player "O" chooses to play less than optimally, this is even better for X. In this case, player "X" can actually win the game instead of just being able to tie.
We will first define an interface for the type of games that we will be working with. This interface will allow us to design algorithms that are agnostic to the exact details of a particular game. Below is the interface we will use:
Now let's implement Tic Tac Toe using this interface.
Minimax can be implemented quite elegantly with recursion. For simplicity, we define a player with either a 1 or a 0. The implementation makes no assumptions about the way state is stored - this decision takes place within the Game class logic.
A complete implementation of solving Tic Tac Toe using minimax can be found here. I have also started a repo where I will be implementing different games and algorithms here - feel free to contribute!
The entire size of the tic-tac-toe state space is only 5,478 . This is easily traversed by our minimax algorithm, so it should be producing the optimal play for each player. Congratulations! You just solved tic-tac-toe! So what are the results? Who wins? Well, it turns out that optimal play results in a draw. Every. Single. Time. Funnily enough, many people think that the optimal solution to chess is also a draw, although the state space of chess is so much larger that this has yet to be confirmed. Next time your friend says they're unbeatable at tic-tic-tac toe, you can prove them wrong. Probably won't ever happen, but hey it can't hurt to be prepared.
Limited Horizon Search
Because it is often not feasable to fully explore the state space, (games like chess have a high branching factor), we can modify the algorithm to only construct the tree to a certain depth. Then instead of evaluating the leaf nodes based on the win/loss condition, we evalutate them based on a heuristic function that is supposed to be an estimate of how favorable a given state is to the current player. We can modify our implementation to do this with the following changes:
- Modify the base cases of
max_playto also return the reward after a certain depth
- Modify the reward to return the value of some heurisitic instead of just
[-1, 0, 1].
Alpha Beta Pruning
It turns out that the minimax algorithm visits a lot of unessesary states when performing its calculations. We can mitigate this using alpha-beta pruning, which eliminates branches of the tree that cannot be optimal based on the current stored node values. This improvement can greatly increase the speed at which the algorithm runs, while still returning optimal results.
In my next post I will dive into these two topics and describe an optimal strategy for tic-tac-toe play. If you want a challenge, try implementing limited horizon search yourself!
Subscribe to Zachary Marion
Get the latest posts delivered right to your inbox