def minimax(
board: Board,
depth: int,
max_depth: int,
is_black: bool
) -> tuple[Score, Move]:
"""
Finds the best move for the input board state.
Note that you are black.
Parameters
----------
board: 2D list of lists. Contains characters "B", "W", and "_",
representing black pawn, white pawn, and empty cell, respectively.
depth: int, the depth to search for the best move. When this is equal
to `max_depth`, you should get the evaluation of the position using
the provided heuristic function.
max_depth: int, the maximum depth for cutoff.
is_black: bool. True when finding the best move for black, False
otherwise.
Returns
-------
A tuple (evalutation, ((src_row, src_col), (dst_row, dst_col))):
evaluation: the best score that black can achieve after this move.
src_row, src_col: position of the pawn to move.
dst_row, dst_col: position to move the pawn to.
"""
def max_value(board, depth):
if utils.is_game_over(board) or depth == max_depth: #if game is over or depth is max_depth return current score
return evaluate(board), None # Return v along with action
v = float('-inf')
best_action = None # Initialize best action
for action in generate_valid_moves(board): #for each action in valid moves
next_board = utils.state_change(board, action[0], action[1], False) #generate next state
next_board = utils.invert_board(next_board, False) #invert the board for the white turns(because all the function is applied for black), so invert board will flip the board and turn white to black
score, _ = max_value(next_board, depth + 1) #get the score for the next state
if score > v:
v = score
best_action = action # Update best action
return v, best_action
return max_value(board, depth)
I have try to implement a minmax algo to get the optimal movement for black or for white with the max depth. As i run it with my testcase, it just return the first action instead of the optimal action which result in the highest evaluation among all. I cant figure out the problem in my code..
def minimax(
board: Board,
depth: int,
max_depth: int,
is_black: bool
) -> tuple[Score, Move]:
"""
Finds the best move for the input board state.
Note that you are black.
Parameters
----------
board: 2D list of lists. Contains characters "B", "W", and "_",
representing black pawn, white pawn, and empty cell, respectively.
depth: int, the depth to search for the best move. When this is equal
to `max_depth`, you should get the evaluation of the position using
the provided heuristic function.
max_depth: int, the maximum depth for cutoff.
is_black: bool. True when finding the best move for black, False
otherwise.
Returns
-------
A tuple (evalutation, ((src_row, src_col), (dst_row, dst_col))):
evaluation: the best score that black can achieve after this move.
src_row, src_col: position of the pawn to move.
dst_row, dst_col: position to move the pawn to.
"""
if depth == max_depth or utils.is_game_over(board):
return evaluate(board), None
# Determine the best move and its evaluation
if is_black:
best_evaluation = float('-inf')
best_move = None
for action in generate_valid_moves(board):
new_board = utils.state_change(board, action[0], action[1], in_place=False)
opponent_evaluation, _ = minimax(new_board, depth + 1, max_depth, False)
if opponent_evaluation > best_evaluation:
best_evaluation = opponent_evaluation
best_move = (action[0], action[1])
return best_evaluation, best_move
else:
best_evaluation = float('inf')
best_move = None
for action in generate_valid_moves(utils.invert_board(board, in_place=False)):
new_board = utils.state_change(utils.invert_board(board, in_place=False), action[0], action[1], in_place=False)
opponent_evaluation, _ = minimax(utils.invert_board(new_board, in_place=False), depth + 1, max_depth, True)
if opponent_evaluation < best_evaluation:
best_evaluation = opponent_evaluation
best_move = (action[0], action[1]) # Convert from black's perspective to white's
return best_evaluation, best_move
I just try a different approach where switch cases when each turn of black and white, by using the internal implementation invert table. So it basically pass the public test case for the solution, but it encounter issues that state that "Make sure that 'evaluate' is called with the correct input, especially for white" which i dont understand why? I thought my evaluate correctly evaluate for white and black as I have inverted the board after white turn.
Assuming the functions you call are all correct (like invert_board
, ...), there is still this one problem:
Although the board is inverted, you've not inverted the score. What is good for white is bad for black, and vice versa, so you should invert the score.
I'd change this:
score, _ = max_value(next_board, depth + 1)
to this:
score = -max_value(next_board, depth + 1)[0]
Some unrelated comments:
I would have the depth
count down instead of up. That way you can compare with 0 (for the base case), and don't need access to max_depth
in this function. The initial caller of minimax
should then only pass max_depth
for the depth
parameter, and it can do without the max_depth
parameter.
invert_board
could have quite a negative impact on running times. Once you get this working, you may get performance improvements by making all your functions (like evaluate
) aware of who's turn it is, and have them return a value from the viewpoint of that player.
Related to the above point: The minimax
function has a is_black
parameter. I hope you have dealt with that information in the code you have not shared, i.e. make sure to invert the board if is_black
is False
before calling your max_value
function.
state_change
could also have a negative impact on running times. If so, you can improve on that by mutating board
(so not making a copy), and then create a function that can perform an undo of a previous move. You'd call this after having made the recursive call.
The base case if
statement could perform the two conditions in opposite order: that will save you a call of is_game_over
when the maximum depth has been reached.
Related to this: is_game_over
could also return the evaluation score in case the game is over. So it could either return a score (indicating the game is over) or None
(indicating the game is not over). That can be used to then save a separate call to evaluate
, which probably would have to do the same game-over checks again.
Consider implementing alpha-beta pruning: it may help to reduce the number of states to analyse, while it still ensures the same score is returned.