pythonalgorithmtic-tac-toeminimaxgame-theory

Why does the result of my alpha-beta search depend on the order in which nodes are visited?


I have programmed a game of n x n tic-tac-toe in Python and added a minimax search, which appears to work correctly. However, when I add alpha-beta pruning, the result of the search becomes dependent on the order in which nodes are visited.

In the minimax function shown below, if I add the line random.shuffle(node._children) after the call to node.create_children(), the result of the minimax search becomes unpredictable (I am testing this by running computer vs. computer games through the GUI and then diffing the resulting game files). However, if I delete the two if alpha >= beta: break statements, then the node shuffling has no effect on the result of the search.

I originally discovered this bug because I tried to sort the child nodes in order to improve the effectiveness of the pruning. Changing the order of the nodes in any way (reversing, sorting, shuffling, etc.) changes the result of the search, as long as those two if statements remain in place. This leads me to conclude that those if statements somehow cause the search to become dependent on the order in which nodes are visited.

The function is roughly based on this pseudocode. The primary difference is that my minimax function only serves to set the value of each node and does not return a value.

Below is the function definition. The full code is here (scroll up for class definitions). The minimax function is called by Tree.get_next_board (here), which is called from the GUI, whenever the engine makes a move. There is a fair amount of statefulness in the code that I would like to eventually reduce, but I am hoping there is a more obvious reason for the problems in my algorithm.

def minimax(node: Node, stats: Stats, alpha=core.NEG_INF, beta=core.INF, depth=8):
    stats.visited += 1

    if node.is_leaf():
        return

    if depth == 0:
        node.set_val(eval_board(node.get_board()))
        return

    stats.created += node.create_children()

    if node.is_max_node():
        new_val = core.NEG_INF
        for child in node.get_children():
            minimax(child, stats, alpha, beta, depth - 1)
            new_val = max(new_val, child.get_val())
            alpha = max(alpha, new_val)
            if alpha >= beta:
                break
    else:
        new_val = core.INF
        for child in node.get_children():
            minimax(child, stats, alpha, beta, depth - 1)
            new_val = min(new_val, child.get_val())
            beta = min(beta, new_val)
            if alpha >= beta:
                break

    node.set_val(new_val)

Does anyone see an obvious reason that adding alpha-beta pruning would make my search dependent on the order in which the nodes are visited? If not, can anyone suggest common causes of such a problem? If it is likely that the problem is buried within the statefulness of my code, I would welcome suggestions on how to reduce the statefulness while still using a tree to cache the boards. If all else fails, I think my best option would be to re-implement minimax as a pure function with no tree or other statefulness and see if that fixes the problem.

If anyone is curious to run the code, they can download the tic_tac_toe module and run python3 -m tic_tac_toe (known to work with Python 3.8.6 on Linux).


Solution

  • It turns out that adding alpha-beta pruning was not actually changing the optimal node value identified by the minimax function, but was changing which one of multiple equally optimal children of the root was selected for the game engine's next move. I had already noticed this behavior with the plain minimax algorithm and had implemented a method (not shown in my question) of breaking ties that was independent of the order in which nodes were visited. However, my understanding of alpha-beta pruning is that it terminates the search after identifying an optimal node, which means that there may be other optimal nodes that are never identified, so adding alpha-beta pruning caused my game engine to choose different (but still optimal) moves depending on the order in which the nodes were visited.

    The code shown in my question may still suffer from bugs as a result of its statefulness. I have since refactored the minimax function and the Node class to reduce the statefulness of the code as much as possible, while still caching the search results in a tree.

    Also see my related question here regarding the behavior of equally optimal nodes during alpha-beta pruning.