[SOLVED] How to stop the learning process with PPO in stablelines?

How to stop the learning process with PPO in stablelines?

So, I created a custom environment based on gymnasium and I want to train it with PPO from stable_baselines3. I'm using version 2.0.0a5 of the latter, in order to use gymnasium. I have the following code:

env = MyEnv()
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=1, progress_bar=True)

This code does not stop, the progress bar goes over the total number of time steps and just goes on... I may be doing something wrong with the environment but I am not sure what and why it would mean that the learning process makes more iterations than the total_timesteps fixed by the user.

So, what could go wrong with the environment? What should I check that could make the learning process infinite?

Edit: the plot thickens. I tried the same thing with an SAC agent and it does not go into an infinite loop during learning. But it does one during evaluation!

Solution

I encountered a similar issue, and it's entirely due to the 'n_steps' parameter. By default, this value is set to 2048 (https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html), which means it will take a long time to execute the 'learn' function. You can perform the following to speed up the process:

model = PPO("MlpPolicy", env, verbose=1, n_steps=10)

This will accelerate the process