I know specifying that total_timesteps=
is a require parameter, but how to I end model.learn()
within a certain episodes? Forgive me for I'm still new to stables_baselines3
and pytorch
still not how to implement it in code.
import gym
import numpy as np
from stable_baselines3 import DDPG
from stable_baselines3.common.noise import NormalActionNoise
env = gym.make('NeuralTraffic-v1')
n_actions = env.action_space.shape[-1]
action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))
model = DDPG("MlpPolicy", env, action_noise=action_noise, verbose=1)
model.learn(total_timesteps=60, log_interval=1)
model.save("ddpg")
env = model.get_env()
I wanted to ended the episode on 60 instead my rollout was:
----------------------------------
| rollout/ | |
| ep_len_mean | 94 |
| ep_rew_mean | -2.36e+04 |
| time/ | |
| episodes | 1 |
| fps | 0 |
| time_elapsed | 452 |
| total_timesteps | 94 |
----------------------------------
I don't understand why is it only 1 episode? I'd like to learn how to implement to restrict learning to specified episodes.
A little late to the party, but hopefully it helps others visiting this page.
Based on your ep_len_mean
variable, each episode of your environment consists of 94 steps, before terminating.
Setting your total_timesteps
at 60 means that the learning algorithm will only run env.step()
60 times before halting the training process, which is shy of 1 full episode (94 steps).
To achieve your desired 60 episodes instead of steps, you can simply take 94 (steps per episode) x 60 (episodes desired) = 5640 (total steps required), which will be your total_timesteps
parameter.