When a model learns there is:
My models are rolling out but they never show a learning phase. This is apparent both in the text output in a jupyter Notebook
in vscode
as well as in tensorboard
.
I built a very simple environment
and tried many more timesteps. What I discovered was:
If there are too few timesteps, the model never displays that it learns
import time
tic = time.perf_counter()
log_path = os.path.join('Training', 'Logs')
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log=log_path)
modelRtn = model.learn(total_timesteps=1000, progress_bar=True)
toc = time.perf_counter()
print("Elapsed time: " + str(toc-tic) + " sec")
Your PPO
has n_steps
parameter that is 2048 by default. collect_rollouts
fills the buffer until 2049-th iteration, then an execution returns to your learn
method and stops immediately by reaching a limit of timesteps because you set only 1000 for the whole learning.