I have programmed a game of n x n tic-tac-toe in Python and added a minimax search, which appears to work correctly. However, when I add alpha-beta pruning, the result of the search becomes dependent on the order in which nodes are visited.
In the minimax
function shown below, if I add the line random.shuffle(node._children)
after the call to node.create_children()
, the result of the minimax search becomes unpredictable (I am testing this by running computer vs. computer games through the GUI and then diff
ing the resulting game files). However, if I delete the two if alpha >= beta: break
statements, then the node shuffling has no effect on the result of the search.
I originally discovered this bug because I tried to sort the child nodes in order to improve the effectiveness of the pruning. Changing the order of the nodes in any way (reversing, sorting, shuffling, etc.) changes the result of the search, as long as those two if
statements remain in place. This leads me to conclude that those if
statements somehow cause the search to become dependent on the order in which nodes are visited.
The function is roughly based on this pseudocode. The primary difference is that my minimax
function only serves to set the value of each node and does not return a value.
Below is the function definition. The full code is here (scroll up for class definitions). The minimax
function is called by Tree.get_next_board
(here), which is called from the GUI, whenever the engine makes a move. There is a fair amount of statefulness in the code that I would like to eventually reduce, but I am hoping there is a more obvious reason for the problems in my algorithm.
def minimax(node: Node, stats: Stats, alpha=core.NEG_INF, beta=core.INF, depth=8):
stats.visited += 1
if node.is_leaf():
return
if depth == 0:
node.set_val(eval_board(node.get_board()))
return
stats.created += node.create_children()
if node.is_max_node():
new_val = core.NEG_INF
for child in node.get_children():
minimax(child, stats, alpha, beta, depth - 1)
new_val = max(new_val, child.get_val())
alpha = max(alpha, new_val)
if alpha >= beta:
break
else:
new_val = core.INF
for child in node.get_children():
minimax(child, stats, alpha, beta, depth - 1)
new_val = min(new_val, child.get_val())
beta = min(beta, new_val)
if alpha >= beta:
break
node.set_val(new_val)
Does anyone see an obvious reason that adding alpha-beta pruning would make my search dependent on the order in which the nodes are visited? If not, can anyone suggest common causes of such a problem? If it is likely that the problem is buried within the statefulness of my code, I would welcome suggestions on how to reduce the statefulness while still using a tree to cache the boards. If all else fails, I think my best option would be to re-implement minimax
as a pure function with no tree or other statefulness and see if that fixes the problem.
If anyone is curious to run the code, they can download the tic_tac_toe
module and run python3 -m tic_tac_toe
(known to work with Python 3.8.6 on Linux).
It turns out that adding alpha-beta pruning was not actually changing the optimal node value identified by the minimax function, but was changing which one of multiple equally optimal children of the root was selected for the game engine's next move. I had already noticed this behavior with the plain minimax algorithm and had implemented a method (not shown in my question) of breaking ties that was independent of the order in which nodes were visited. However, my understanding of alpha-beta pruning is that it terminates the search after identifying an optimal node, which means that there may be other optimal nodes that are never identified, so adding alpha-beta pruning caused my game engine to choose different (but still optimal) moves depending on the order in which the nodes were visited.
The code shown in my question may still suffer from bugs as a result of its statefulness. I have since refactored the minimax
function and the Node
class to reduce the statefulness of the code as much as possible, while still caching the search results in a tree.
Also see my related question here regarding the behavior of equally optimal nodes during alpha-beta pruning.