reinforcement-learningopenai-gym

How did `Open AI Gym` keep track of steps exceeding 500 in the CartPole environment?


I am looking at the CartPole environment over here and I don't see how the step function (or any other function) takes care to ensure the agent doesn't cross 500 steps -

def step(self, action):

err_msg = f"{action!r} ({type(action)}) invalid"

assert self.action_space.contains(action), err_msg

assert self.state is not None, "Call reset before using step method."

x, x_dot, theta, theta_dot = self.state

force = self.force_mag if action == 1 else -self.force_mag

costheta = math.cos(theta)

sintheta = math.sin(theta)



# For the interested reader:

# https://coneural.org/florian/papers/05_cart_pole.pdf

temp = (

force + self.polemass_length * theta_dot**2 * sintheta

) / self.total_mass

thetaacc = (self.gravity * sintheta - costheta * temp) / (

self.length * (4.0 / 3.0 - self.masspole * costheta**2 / self.total_mass)

)

xacc = temp - self.polemass_length * thetaacc * costheta / self.total_mass



if self.kinematics_integrator == "euler":

x = x + self.tau * x_dot

x_dot = x_dot + self.tau * xacc

theta = theta + self.tau * theta_dot

theta_dot = theta_dot + self.tau * thetaacc

else: # semi-implicit euler

x_dot = x_dot + self.tau * xacc

x = x + self.tau * x_dot

theta_dot = theta_dot + self.tau * thetaacc

theta = theta + self.tau * theta_dot



self.state = (x, x_dot, theta, theta_dot)



terminated = bool(

x < -self.x_threshold

or x > self.x_threshold

or theta < -self.theta_threshold_radians

or theta > self.theta_threshold_radians

)



if not terminated:

reward = 1.0

elif self.steps_beyond_terminated is None:

# Pole just fell!

self.steps_beyond_terminated = 0

reward = 1.0

else:

if self.steps_beyond_terminated == 0:

logger.warn(

"You are calling 'step()' even though this "

"environment has already returned terminated = True. You "

"should always call 'reset()' once you receive 'terminated = "

"True' -- any further steps are undefined behavior."

)

self.steps_beyond_terminated += 1

reward = 0.0



if self.render_mode == "human":

self.render()

return np.array(self.state, dtype=np.float32), reward, terminated, False, {}

That's not the case with Farama Gymnasium though. The step function there has the following code to ensure it -

truncated = self.steps >= self.max_episode_steps



Unfortunately, I am expected to run the gym environment. I am currently facing the issue where the agent doesn't stop even though it crosses 500 steps.


Solution

  • It seems that for default environments, the TimeLimitWrapper over here takes care of applying the time limits -

        # Add the time limit wrapper
        if max_episode_steps is not None:
            env = TimeLimit(env, max_episode_steps)
        elif spec_.max_episode_steps is not None:
            env = TimeLimit(env, spec_.max_episode_steps)
    

    The __init__.py file contains the max_episode_steps for CartPole -

    register(
        id="CartPole-v0",
        entry_point="gym.envs.classic_control.cartpole:CartPoleEnv",
        max_episode_steps=200,
        reward_threshold=195.0,
    )