pythontensorboardstable-baselines

StableBaselines3 / PPO / model rollsout but does not learn?


When a model learns there is:

  1. A rollout phase
  2. A learning phase

My models are rolling out but they never show a learning phase. This is apparent both in the text output in a jupyter Notebook in vscode as well as in tensorboard.

I built a very simple environment and tried many more timesteps. What I discovered was:

If there are too few timesteps, the model never displays that it learns

import time

tic = time.perf_counter()

log_path = os.path.join('Training', 'Logs')
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log=log_path)
modelRtn = model.learn(total_timesteps=1000, progress_bar=True)

toc = time.perf_counter()
print("Elapsed time:  " + str(toc-tic) + " sec")

Solution

  • Your PPO has n_steps parameter that is 2048 by default. collect_rollouts fills the buffer until 2049-th iteration, then an execution returns to your learn method and stops immediately by reaching a limit of timesteps because you set only 1000 for the whole learning.