pythonalgorithmnumpynegamax

Python Negamax Algorithm


I have about as simple of a negamax algorithm as possible, for evaluating positions in Tic Tac Toe. The state of the game is stored as an array in numpy, with X's pieces represented by 1, and O's pieces represented by four.

I was testing this just now, and found:

a = np.zeros(9).reshape(3,3)
negaMax(a, 6, 1) # Returned zero as it should
negaMax(a, 7, 1) # Returns 100

Meaning that my algorithm thinks it has found a way for X to win in seven plies in a game of Tic Tac Toe, which is obviously impossible against decent play. I can't work out how to have it print the best moves it has found, so am having real trouble debugging this. What am I doing wrong?

def winCheck(state):
        """Takes a position, and returns the outcome of that game"""
        # Sums which correspond to a line across a column
        winNums = list(state.sum(axis=0))
        # Sums which correspond to a line across a row
        winNums.extend(list(state.sum(axis=1)))
        # Sums which correspond to a line across the main diagonal
        winNums.append(state.trace())
        # Sums which correspond to a line across the off diagonal
        winNums.append(np.flipud(state).trace())

        if Square.m in winNums:
                return 'X'
        elif (Square.m**2 + Square.m) in winNums:
                return 'O'
        elif np.count_nonzero(state) == Square.m**2:
                return 'D'
        else:
                return None

def moveFind(state):
        """Takes a position as an nparray and determines the legal moves"""
        moveChoices = []

        # Iterate over state, to determine which squares are empty
        it = np.nditer(state, flags=['multi_index'])
        while not it.finished:
            if it[0] == 0:
                    moveChoices.append(it.multi_index)
            it.iternext()
        return moveChoices

def moveSim(state, move, player):
        """Create the state of the player having moved without interfering with the board"""
        simState = state.copy()
        if player == 1:
                simState[move] = 1
        else:
                simState[move] = gamecfg.n + 1
        return simState

def positionScore(state):
        """The game is either won or lost"""
        if winCheck(state) == 'X':
                return 100
        elif winCheck(state) == 'O':
                return -100
        else:
                return 0

def negaMax(state, depth, colour):
        """Recursively find the best move via a negamax search"""
        if depth == 0:
                return positionScore(state) * colour

        highScore = -100

        moveList = moveFind(state)
        for move in moveList:
                score = -negaMax(moveSim(state, move, colour), depth -1, colour * -1)
                highScore = max(score, highScore)

        return highScore

Solution

  • Your code does not consider the game to stop when a line of 3 symbols is made.

    This means that it is playing a variant of tic-tac-toe where X wins if he makes a line of 3 even after O has made a line of 3.

    For this variant, the program has correctly found that it is possible for X to always win!

    (I came across the same situation with a chess program I made where the computer was happy to sacrifice its king if it would reach checkmate a little later...)