I have about as simple of a negamax algorithm as possible, for evaluating positions in Tic Tac Toe. The state of the game is stored as an array in numpy, with X's pieces represented by 1, and O's pieces represented by four.
I was testing this just now, and found:
a = np.zeros(9).reshape(3,3)
negaMax(a, 6, 1) # Returned zero as it should
negaMax(a, 7, 1) # Returns 100
Meaning that my algorithm thinks it has found a way for X to win in seven plies in a game of Tic Tac Toe, which is obviously impossible against decent play. I can't work out how to have it print the best moves it has found, so am having real trouble debugging this. What am I doing wrong?
def winCheck(state):
"""Takes a position, and returns the outcome of that game"""
# Sums which correspond to a line across a column
winNums = list(state.sum(axis=0))
# Sums which correspond to a line across a row
winNums.extend(list(state.sum(axis=1)))
# Sums which correspond to a line across the main diagonal
winNums.append(state.trace())
# Sums which correspond to a line across the off diagonal
winNums.append(np.flipud(state).trace())
if Square.m in winNums:
return 'X'
elif (Square.m**2 + Square.m) in winNums:
return 'O'
elif np.count_nonzero(state) == Square.m**2:
return 'D'
else:
return None
def moveFind(state):
"""Takes a position as an nparray and determines the legal moves"""
moveChoices = []
# Iterate over state, to determine which squares are empty
it = np.nditer(state, flags=['multi_index'])
while not it.finished:
if it[0] == 0:
moveChoices.append(it.multi_index)
it.iternext()
return moveChoices
def moveSim(state, move, player):
"""Create the state of the player having moved without interfering with the board"""
simState = state.copy()
if player == 1:
simState[move] = 1
else:
simState[move] = gamecfg.n + 1
return simState
def positionScore(state):
"""The game is either won or lost"""
if winCheck(state) == 'X':
return 100
elif winCheck(state) == 'O':
return -100
else:
return 0
def negaMax(state, depth, colour):
"""Recursively find the best move via a negamax search"""
if depth == 0:
return positionScore(state) * colour
highScore = -100
moveList = moveFind(state)
for move in moveList:
score = -negaMax(moveSim(state, move, colour), depth -1, colour * -1)
highScore = max(score, highScore)
return highScore
Your code does not consider the game to stop when a line of 3 symbols is made.
This means that it is playing a variant of tic-tac-toe where X wins if he makes a line of 3 even after O has made a line of 3.
For this variant, the program has correctly found that it is possible for X to always win!
(I came across the same situation with a chess program I made where the computer was happy to sacrifice its king if it would reach checkmate a little later...)