pythonreinforcement-learningopenai-gymstable-baselines

How do I log observations after reset in Stable_Baselines3?


I want to log each observation obtained after reset during training, while using SB3.

Based on this issue message, I decided to use the Monitor wrapper instead of a callback.

However, the Monitor wrapper is giving me an error. Here is my code -

import gym
from stable_baselines3 import PPO
from stable_baselines3.common.callbacks import BaseCallback

from stable_baselines3.common.monitor import Monitor

class CustomMonitor(Monitor):
    def __init__(self, env, filename=None, allow_early_resets=True, reset_keywords=(), info_keywords=()):
        super(CustomMonitor, self).__init__(env)
        self.reset_observations = []

    def reset(self, **kwargs):
        observation = super(CustomMonitor, self).reset(**kwargs)
        self.reset_observations.append(observation)
        return observation

env = gym.make('LunarLander-v2')
env = CustomMonitor(env)

model = PPO('MlpPolicy', env, verbose=1)
# Train the model
model.learn(total_timesteps=1000000)

# Save the model
model.save("ppo_lunarlander_mutant")


However, after running it, I am getting the following error -

Traceback (most recent call last):
  File "minimal_example.py", line 21, in <module>
    model = PPO('MlpPolicy', env, verbose=1)
  File "/home/thoma/anaconda3/envs/wp/lib/python3.8/site-packages/stable_baselines3/ppo/ppo.py", line 109, in __init__
    super().__init__(
  File "/home/thoma/anaconda3/envs/wp/lib/python3.8/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 85, in __init__
    super().__init__(
  File "/home/thoma/anaconda3/envs/wp/lib/python3.8/site-packages/stable_baselines3/common/base_class.py", line 180, in __init__
    assert isinstance(self.action_space, supported_action_spaces), (
AssertionError: The algorithm only supports (<class 'gymnasium.spaces.box.Box'>, <class 'gymnasium.spaces.discrete.Discrete'>, <class 'gymnasium.spaces.multi_discrete.MultiDiscrete'>, <class 'gymnasium.spaces.multi_binary.MultiBinary'>) as action spaces but Discrete(4) was provided


Solution

  • I was supposed to use gymnasium instead of gym. This should have been evident from the following error -

    AssertionError: The algorithm only supports (<class 'gymnasium.spaces.box.Box'>, <class 'gymnasium.spaces.discrete.Discrete'>, <class 'gymnasium.spaces.multi_discrete.MultiDiscrete'>, <class 'gymnasium.spaces.multi_binary.MultiBinary'>) as action spaces but Discrete(4) was provided

    Perhaps an older version of stable_baselines3 could work with gym and that requires further investigation