python numpy reinforcement-learning mdptoolbox

Understanding the argument values for mdptoolbox forest example

I am trying to understand how to use mdptoolbox and had a few questions.

What does 20 mean in the following statement?

P, R = mdptoolbox.example.forest(10, 20, is_sparse=False)

I understand that 10 here denotes the number of possible states. What does 20 mean here? Does it represent the total number of actions per state? I want to restrict the MDP to exactly 2 actions per state. How could I do this?

The shape of P returned above is (2, 10, 10). What does 2 represent here? No matter what values I use for total states and actions, it is always 2.

Solution

The code which you are running is correct, but what you are using is an example from the toolbox.

Please go through the documentation carefully.

In the following code:

P, R = mdptoolbox.example.forest(10, 20, is_sparse=False)

The second argument is not an action-argument for the MDP. Its documentation explains the second argument as follows:

The reward when the forest is in its oldest state and action ‘Wait’ is performed. Default: 4.

In your case, the value of the reward is passed as 20 when the forest is in the oldest state and the action Wait is performed.

In case of this example, the forest is managed by two actions: ‘Wait’ and ‘Cut’. Please refer this documentation for more details. Since, 2 actions possible, the transition probability matrix P returned by this function is also having the first dimension size as 2. You do not need to manually restrict the action space dimension to 2.

To understand the use of this toolbox, you should also go through this link.