pythonalgorithmminimax

When should I include the score benefit of a local decision when using minimax?


In the Stone Game problem, Alice and Bob take turns picking a pile of stones from the start or the end. The goal is to maximize Alice's total

def play(turn, left, right):
    if left > right:
        return 0

    end = piles[right] + play(1 - turn, left, right - 1)
    start = piles[left] + play(1 - turn, left + 1, right)

    return max(start, end) if turn == 0 else min(start, end)

alice = play(0, 0, n - 1)

This follows the classic minimax algorithm.

Let's now take a look at Stone Game II. In this problem, Alice and Bob can pick the next 1 <= x <= 2m piles of stones, where m is the maximum x somebody has used.

To my surprise, classic minimax would return the same number of stones whether it is Alice or Bob's turn, giving us an incorrect final answer

# DOESN'T WORK
def play(left, m, turn):
    if left == n-1:
        return 0

    total = 0
    ans = inf if turn else -inf
    for pos in range(left+1, min(n, left+2*m+1)):
        total += piles[pos]
        value = total + play(pos, max(m, pos - left), 1 - turn)
        if turn == 0:
            ans = max(ans, value)
        else:
            ans = min(ans, value)
    
    return ans

alice = play(-1, 1, 0)

However, if we only include total in Alice's calculation, it suddenly works:

# WORKS
def play(left, m, turn):
    if left == n-1:
        return 0

    total = 0
    ans = inf if turn else -inf
    for pos in range(left+1, min(n, left+2*m+1)):
        total += piles[pos]
        value = play(pos, max(m, pos - left), 1 - turn)
        if turn == 0:
            ans = max(ans, total + value)
        else:
            ans = min(ans, value)
    
    return ans

alice = play(-1, 1, 0)

Could someone explain why we're not supposed to add the local total for the minimizer in the second example?

Here's a discrepancy I noticed that may have something to do with the answer: value is always the same recursive call when we take min/max in the second problem, but in the first problem, end and start are different recursive calls.


Solution

  • The function we're trying to maximize is f := amount of stones alice gets. That's why we only add stones for Alice; the function we're maximizing doesn't include the amount of stones Bob gets.

    So then why does the first algorithm work? Turns out it's not generally correct, and only works because this specific problem constrains len(piles) as even, making it so Alice always wins. If we actually look at the values from the first function, it returns the same answer for Bob and Alice too (this was just obfuscated because alice > sum(piles) - alice is always true, so the bug didn't produce wrong answers)

    An actually correct minimax implementation for game 1 would look like this:

    def play(turn, left):
        right_used = turn - left
        right = n - 1 - right_used
    
        if left > right:
            return 0
    
        end = play(turn + 1, left)
        start = play(turn + 1, left + 1)
    
        return min(start, end) if turn % 2 else max(piles[left] + start, piles[right] + end)