So, I created a custom environment based on gymnasium and I want to train it with PPO from stable_baselines3
. I'm using version 2.0.0a5 of the latter, in order to use gymnasium. I have the following code:
env = MyEnv()
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=1, progress_bar=True)
This code does not stop, the progress bar goes over the total number of time steps and just goes on... I may be doing something wrong with the environment but I am not sure what and why it would mean that the learning process makes more iterations than the total_timesteps
fixed by the user.
So, what could go wrong with the environment? What should I check that could make the learning process infinite?
Edit: the plot thickens. I tried the same thing with an SAC agent and it does not go into an infinite loop during learning. But it does one during evaluation!
I encountered a similar issue, and it's entirely due to the 'n_steps' parameter. By default, this value is set to 2048 (https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html), which means it will take a long time to execute the 'learn' function. You can perform the following to speed up the process:
model = PPO("MlpPolicy", env, verbose=1, n_steps=10)
This will accelerate the process