reinforcement-learningrayrllibray-tune

How to end episodes after 200 steps in Ray Tune (tune.run()) using a PPO model with torch


I'm using the following code to import a custom environement and then train on it:


from ray.tune.registry import register_env
import ray
from ray import air, tune
from ray.rllib.algorithms.ppo import PPO

from gym_env.cube_env import CubeEnv

select_env = "2x2x2_cube-v0"
register_env(select_env, lambda config: CubeEnv())

ray.init()

tune.run(PPO, config={
   "env": "2x2x2_cube-v0",
   "framework": "torch",
   "log_level": "INFO",
   "batch_mode": "truncate_episodes",
   "rollout_fragment_length": 200
})

Despite specifing "batch_mode": "truncate_episodes" and "rollout_fragment_length": 200, my episodes do not end even after 4000 steps.

I also get a warning that "Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset."

The difficulty of my randomly generated environement can vary significantly, so it would be better if the model would just restart if it's stuck after 200 steps.

Do I need to add this "max_episode_steps" attribute to my environment (and if so how) or can I set this directly in tune.run() config?

I've tried using the Ray website, but I cannot find any decumentation and many of the tutorails available there are using out-of-date code.

I've tried copying come configs from different PPOs I could find such as "batch_mode": "truncate_episodes" and "rollout_fragment_length": 200, but they don't seem to do what I would want them to.


Solution

  • You can wrap your env in a TimeLimit wrapper from gymnasium: See here

    As an extra argument, you'd provide the max_episode_steps (i.e. 200 steps). When reaching that number, your env will set the truncated flag in the step function at true.

    So in your case it'd be:

    from gymnasium.wrappers.time_limit import TimeLimit

    TimeLimit(CubeEnv(), max_episode_steps=200)