In the mcts algorithm described in Wikipedia, it performs exactly one playout(simulation) in each node selection. Now, I am experimenting this algorithm in a simple connect-k game. I wonder, in practice, do we perform more playouts to reduce the variance?
I tried the original algorithm with exactly one random playout (non-biased). The result is bad compared to my heuristic search with alpha-beta pruning. It converges very slowly. When I perform 500 playouts instead, the noise is a lot less. However, each node simulation is too slow for the algorithm to explore other parts of the tree in the given time hence missing the most critical move sometimes.
I then added the AMAF (in particular with RAVE transition) heuristic to the basic MCTS. I don't notice too much difference with 500 playouts perhaps because the variance is already low. I haven't analyzed the result with 1 playout yet.
Could anyone give me any insights?
Typically, you'd do exactly one play-out per selection step. However, subsequent selection steps can go through the same node multiple times.
Consider, for example, a case where there are only two moves available in the root node. If you then run, let's say, 10,000 complete iterations of MCTS (where one iteration = Selection + Expansion + Play-out + Backpropagation), each of the two nodes below the root node will get selected roughly 5,000 times (or maybe one gets selected 9,000 times and the other 1,000 times if the first is clearly a better option than the seocnd, but still, both get selected more than once).
Does this match what you are currently doing in your implementation? If not, try providing some code that you currently have so that we can see where it goes wrong. But if this is how you implemented it (which is how it should be), then there should be no problems with doing only one play-out per selection step