optimizationartificial-intelligencesimulationchessmontecarlo

Monte Carlo Tree Search Implementation


I understand the theory behind MCTS, but here is my issue. Monte Carlo methods in games require simulating a game from the current state until a terminal state is reached. In order for the search to converge to Minimax (the actual ideal move sequence), thousands, perhaps millions of games must be simulated.

The way I am doing it now is by using the same move-generation functions I would use for a normal minimax search, as well as the same make-move function coupled with checking for wins after each move. In complex games like chess or checkers (and even in simple games) these are very costly operations. My question is: Is there a better way of implementing the game simulations to make them less costly? Can I not perform full move generation, and not check for wins each time? Any other ways to improve simulation time


Solution

  • My question is: Is there a better way of implementing the game simulations to make them less costly? Can I not perform full move generation, and not check for wins each time?

    No, you can't really avoid move generation and checking for terminal game states. If you don't generate moves, you can't select moves to play either (which you obviously need to do to advance a simulation). If you don't check for terminal game states... you're going to get simulations that don't correspond to legal games, and you're unnecessarily continuing simulations for too long.

    Speeding up move generation

    In some games, it may be possible to select moves without generating ALL moves though, by only generating some moves. For example, in chess I suppose you could first randomly pick a piece to move with, then only generate moves for that piece and randomly select one of those. This would be faster than generating all legal moves for all pieces, and then randomly selecting one of those. However, it also introduces a different bias into your move selection, the probability distribution over the moves that you end up playing in simulations will be different from a uniform distribution over all legal moves. It's difficult to say if this will be better or worse necessarily, but definitely different.

    Ending simulations early

    One of the advantages of MCTS is that it does not necessarily need a heuristic evaluation function for non-terminal game states. However, if you want, you could still use one. You could end simulations early (before reaching terminal states), and then use a heuristic evaluation function to evaluate those states. If you want to keep the guarantee of converging to optimal solutions given an infinite amount of processing time, you'll want to make sure to only put a cap on the number of steps you take in the Play-out phase, and not in the Selection phase. In practice, you tend not to have an infinite amount of processing time though, so this difference will not matter too much (I still suspect a small improvement from making this distinction though)

    Optimized implementations

    Of course, it may also be possible to simply optimize your move generation / terminal game state checking / advance functions. It is difficult to tell how much room for improvement there is here without seeing your code. But, for example, bitboard-based state representations will lead to much more efficient functions than naive/straightforward representations