[SOLVED] StableBaselines3 / PPO / model rollsout but does not learn?

StableBaselines3 / PPO / model rollsout but does not learn?

When a model learns there is:

A rollout phase
A learning phase

My models are rolling out but they never show a learning phase. This is apparent both in the text output in a jupyter Notebook in vscode as well as in tensorboard.

I built a very simple environment and tried many more timesteps. What I discovered was:

If there are too few timesteps, the model never displays that it learns

What is the minimum number of timesteps to learn?
Is this the same for all environments or does it depend upon your environment?

import time

tic = time.perf_counter()

log_path = os.path.join('Training', 'Logs')
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log=log_path)
modelRtn = model.learn(total_timesteps=1000, progress_bar=True)

toc = time.perf_counter()
print("Elapsed time:  " + str(toc-tic) + " sec")

Solution

Your PPO has n_steps parameter that is 2048 by default. collect_rollouts fills the buffer until 2049-th iteration, then an execution returns to your learn method and stops immediately by reaching a limit of timesteps because you set only 1000 for the whole learning.