pytorchpython-3.7stable-baselines

How to I specify model.learn() to end within a certain episodes of stable baselines 3?


I know specifying that total_timesteps= is a require parameter, but how to I end model.learn() within a certain episodes? Forgive me for I'm still new to stables_baselines3 and pytorch still not how to implement it in code.

import gym
import numpy as np
from stable_baselines3 import DDPG
from stable_baselines3.common.noise import NormalActionNoise

env = gym.make('NeuralTraffic-v1')

n_actions = env.action_space.shape[-1]
action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))
model = DDPG("MlpPolicy", env, action_noise=action_noise, verbose=1)
model.learn(total_timesteps=60, log_interval=1)
model.save("ddpg")
env = model.get_env()

I wanted to ended the episode on 60 instead my rollout was:

----------------------------------
| rollout/           |           |
|    ep_len_mean     | 94        |
|    ep_rew_mean     | -2.36e+04 |
| time/              |           |
|    episodes        | 1         |
|    fps             | 0         |
|    time_elapsed    | 452       |
|    total_timesteps | 94        |
----------------------------------

I don't understand why is it only 1 episode? I'd like to learn how to implement to restrict learning to specified episodes.


Solution

  • A little late to the party, but hopefully it helps others visiting this page.

    Based on your ep_len_mean variable, each episode of your environment consists of 94 steps, before terminating.

    Setting your total_timesteps at 60 means that the learning algorithm will only run env.step() 60 times before halting the training process, which is shy of 1 full episode (94 steps).

    To achieve your desired 60 episodes instead of steps, you can simply take 94 (steps per episode) x 60 (episodes desired) = 5640 (total steps required), which will be your total_timesteps parameter.