From many blogs and this one https://web.archive.org/web/20160308070346/http://mcts.ai/about/index.html We know that the process of MCTS algorithm has 4 steps.
- Selection: Starting at root node R, recursively select optimal child nodes until a leaf node L is reached.
What does leaf node L mean here? I thought it should be a node representing the terminal state of the game, or another word which ends the game. If L is not a terminal node (one end state of the game), how do we decide that the selection step stops on node L?
- Expansion: If L is a not a terminal node (i.e. it does not end the game) then create one or more child nodes and select one C.
From this description I realise that obviously my previous thought incorrect.
Then if L is not a terminal node, it implies that L should have children, why not continue finding a child from L at the "Selection" step?
Do we have the children list of L at this step?
From the description of this step itself, when do we create one child node, and when do we need to create more than one child nodes? Based on what rule/policy do we select node C?
- Simulation: Run a simulated playout from C until a result is achieved.
Because of the confusion of the 1st question, I totally cannot understand why we need to simulate the game. I thought from the selection step, we can reach the terminal node and the game should be ended on node L in this path. We even do not need to do "Expansion" because node L is the terminal node.
- Backpropagation: Update the current move sequence with the simulation result.
Fine. Last question, from where did you get the answer to these questions?
Thank you
BTW, I also post the same question https://ai.stackexchange.com/questions/16606/how-to-understand-the-4-steps-of-monte-carlo-tree-search
What does leaf node L mean here?
For the sake of explanation I'm assuming that all the children of a selected node are added during the expansion phase of the algorithm.
When the algorithm starts, the tree is formed only by the root node (a leaf node).
The Expansion phase adds all the states reachable from the root of the tree. Now you have a bigger tree where the leaves are the last added nodes (the root node isn't a leaf anymore).
At any given iteration of the algorithm, the tree (the gray area of the picture) grows. Some of its leaves could be terminal states (according to the rules of the game/problem) but it's not granted.
If you expand too much, you could run out of memory. So the typical implementation of the expansion phase only adds a single node to the existing tree.
In this scenario you could change the word leaf with not fully expanded:
Starting at root node R, recursively select optimal child nodes until a not fully expanded node L is reached
Based on what rule/policy do we select node C?
It's domain-dependent. Usually you randomly choose a move/state.
NOTES