How did `Open AI Gym` keep track of steps exceeding 500 in the CartPole environment?

I am looking at the CartPole environment over here and I don't see how the step function (or any other function) takes care to ensure the agent doesn't cross 500 steps -

def step(self, action):

err_msg = f"{action!r} ({type(action)}) invalid"

assert self.action_space.contains(action), err_msg

assert self.state is not None, "Call reset before using step method."

x, x_dot, theta, theta_dot = self.state

force = self.force_mag if action == 1 else -self.force_mag

costheta = math.cos(theta)

sintheta = math.sin(theta)



# For the interested reader:

# https://coneural.org/florian/papers/05_cart_pole.pdf

temp = (

force + self.polemass_length * theta_dot**2 * sintheta

) / self.total_mass

thetaacc = (self.gravity * sintheta - costheta * temp) / (

self.length * (4.0 / 3.0 - self.masspole * costheta**2 / self.total_mass)

)

xacc = temp - self.polemass_length * thetaacc * costheta / self.total_mass



if self.kinematics_integrator == "euler":

x = x + self.tau * x_dot

x_dot = x_dot + self.tau * xacc

theta = theta + self.tau * theta_dot

theta_dot = theta_dot + self.tau * thetaacc

else: # semi-implicit euler

x_dot = x_dot + self.tau * xacc

x = x + self.tau * x_dot

theta_dot = theta_dot + self.tau * thetaacc

theta = theta + self.tau * theta_dot



self.state = (x, x_dot, theta, theta_dot)



terminated = bool(

x < -self.x_threshold

or x > self.x_threshold

or theta < -self.theta_threshold_radians

or theta > self.theta_threshold_radians

)



if not terminated:

reward = 1.0

elif self.steps_beyond_terminated is None:

# Pole just fell!

self.steps_beyond_terminated = 0

reward = 1.0

else:

if self.steps_beyond_terminated == 0:

logger.warn(

"You are calling 'step()' even though this "

"environment has already returned terminated = True. You "

"should always call 'reset()' once you receive 'terminated = "

"True' -- any further steps are undefined behavior."

)

self.steps_beyond_terminated += 1

reward = 0.0



if self.render_mode == "human":

self.render()

return np.array(self.state, dtype=np.float32), reward, terminated, False, {}

That's not the case with Farama Gymnasium though. The step function there has the following code to ensure it -

truncated = self.steps >= self.max_episode_steps

Unfortunately, I am expected to run the gym environment. I am currently facing the issue where the agent doesn't stop even though it crosses 500 steps.

Solution

It seems that for default environments, the TimeLimitWrapper over here takes care of applying the time limits -

    # Add the time limit wrapper
    if max_episode_steps is not None:
        env = TimeLimit(env, max_episode_steps)
    elif spec_.max_episode_steps is not None:
        env = TimeLimit(env, spec_.max_episode_steps)

The __init__.py file contains the max_episode_steps for CartPole -

register(
    id="CartPole-v0",
    entry_point="gym.envs.classic_control.cartpole:CartPoleEnv",
    max_episode_steps=200,
    reward_threshold=195.0,
)