pythonjupyterreinforcement-learningopenai-gym

Too many / Not enough values in OpenAI Gym Mario Model for Reinforcement Learning


Reinforcement learning using OpenAI Gym has the ability to make a reinforcement model for playing Super Mario Bros. I tried doing this following Nicholas Renotte's youtube tutorial but around 10 minutes I get the errors "too many values to unpack (expected 4) or "not enough values to unpack (expected 5, got 4)."

The error comes from the 4 parameter return in the loop, but I think it originates from where "env" is instantiated.

From Jupyter Notebook:

#!pip install gym_super_mario_bros==7.3.0 nes_py 
import gym_super_mario_bros #import game

from nes_py.wrappers import JoypadSpace #import wrapper

from gym_super_mario_bros.actions import SIMPLE_MOVEMENT #import basic movements

# Initialize the game

env = gym_super_mario_bros.make('SuperMarioBros-v0', apply_api_compatibility=True, render_mode="human")

#env = gym_super_mario_bros.make('SuperMarioBros-v0')

#make calls the type of environment.you can find more environmnets on the gym website. 

print(env.action_space) #this shows there are 256 actions (complex)

env = JoypadSpace(env, SIMPLE_MOVEMENT) 
#this wraps the environmnet with the simple movement inputs into one object

print(env.action_space) #This shows there are 7 available actions (simplified)

print(env.observation_space.shape)

print(env.observation_space)

print((env.action_space.sample()))

done = True # Create a flag when finished to know when to restart

for step in range(100000): # Loop through each frame in the game

    if done: 

        # Start the gamee

        env.reset()

    state, reward, done, info = env.step(env.action_space.sample())
 # Do random actions

    # Show the game on the screen

    env.render()
# Close the game
env.close()

Solution

  • The issue is with this line : state, reward, done, info = env.step(env.action_space.sample()). you're trying to unpack env.step using 4 variables instead of 5. Take a look at the documentation of the step function here

    Replace it with this :

    state, reward, done, truncated , info = env.step(env.action_space.sample()