I want to log each observation
obtained after reset
during training, while using SB3.
Based on this issue message, I decided to use the Monitor
wrapper instead of a callback.
However, the Monitor
wrapper is giving me an error.
Here is my code -
import gym
from stable_baselines3 import PPO
from stable_baselines3.common.callbacks import BaseCallback
from stable_baselines3.common.monitor import Monitor
class CustomMonitor(Monitor):
def __init__(self, env, filename=None, allow_early_resets=True, reset_keywords=(), info_keywords=()):
super(CustomMonitor, self).__init__(env)
self.reset_observations = []
def reset(self, **kwargs):
observation = super(CustomMonitor, self).reset(**kwargs)
self.reset_observations.append(observation)
return observation
env = gym.make('LunarLander-v2')
env = CustomMonitor(env)
model = PPO('MlpPolicy', env, verbose=1)
# Train the model
model.learn(total_timesteps=1000000)
# Save the model
model.save("ppo_lunarlander_mutant")
However, after running it, I am getting the following error -
Traceback (most recent call last):
File "minimal_example.py", line 21, in <module>
model = PPO('MlpPolicy', env, verbose=1)
File "/home/thoma/anaconda3/envs/wp/lib/python3.8/site-packages/stable_baselines3/ppo/ppo.py", line 109, in __init__
super().__init__(
File "/home/thoma/anaconda3/envs/wp/lib/python3.8/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 85, in __init__
super().__init__(
File "/home/thoma/anaconda3/envs/wp/lib/python3.8/site-packages/stable_baselines3/common/base_class.py", line 180, in __init__
assert isinstance(self.action_space, supported_action_spaces), (
AssertionError: The algorithm only supports (<class 'gymnasium.spaces.box.Box'>, <class 'gymnasium.spaces.discrete.Discrete'>, <class 'gymnasium.spaces.multi_discrete.MultiDiscrete'>, <class 'gymnasium.spaces.multi_binary.MultiBinary'>) as action spaces but Discrete(4) was provided
I was supposed to use gymnasium
instead of gym
. This should have been evident from the following error -
AssertionError: The algorithm only supports (<class 'gymnasium.spaces.box.Box'>, <class 'gymnasium.spaces.discrete.Discrete'>, <class 'gymnasium.spaces.multi_discrete.MultiDiscrete'>, <class 'gymnasium.spaces.multi_binary.MultiBinary'>) as action spaces but Discrete(4) was provided
Perhaps an older version of stable_baselines3
could work with gym
and that requires further investigation