reinforcement-learningopenai-gympettingzoo

Gymnasium/Petting Zoo: Getting Tic Tac Toe to show ansi text


Using the Tic Tac Toe environment:

from pettingzoo.classic import tictactoe_v3

env = tictactoe_v3.env(render_mode="ansi")
env.reset(seed=1)

env.step(1)
print(env.render())

This outputs an empty string '', and also launches an unnecessary/unopenable python window. It properly displays a graphical board in the new window if I specify render_mode="human", and it also prints a long array to terminal if I specify render_mode="rgb_array".

I just want a text output of my tic tac toe board. What am I missing?


Solution

  • It is not clear to me why the window opens, as I would also expect that with ansi render mode the rendering would be done in the terminal. I guess this is some glitch of the tic-tac-toe implementation. What you seem to be looking for is a representation of the environment state. However this is not supported by all environments according to this documentation - see state(). The documentation is a bit misleading for these rendering methods.

    Sure enough, for tic-tac-toe:

    > env.state()
    
    NotImplementedError: state() method has not been implemented in the environment tictactoe_v3.
    

    They compute the board state in the code to be able to render it (to a window, unfortunately), but that computation is done internally and is not accessible directly. But it so happens that in this game the state is equal to the observations of both players, who see the whole board after every turn. So you can implement your own state method using this. Here is my version:

    import numpy as np
    from pettingzoo.classic import tictactoe_v3
    
    env = tictactoe_v3.raw_env(render_mode=None)
    env.reset(seed=42)
    
    env.step(0)
    env.step(1)
    env.step(2)
    env.step(3)
    env.step(4)
    
    
    def get_state(env):
        obs = env.observe("player_1")["observation"]
        rvl = obs.ravel()
    
        arr = np.empty(rvl.shape, int)
        arr[::2] = 1
        arr[1::2] = 2
    
        rvl *= arr
    
        grp_x = np.array(rvl[::2]).reshape(3, 3).T
        grp_o = np.array(rvl[1::2]).reshape(3, 3).T
        res = grp_x + grp_o
    
        dct = {1: "X", 2: "O", 0: " "}
        return np.vectorize(dct.get)(res)
    
    
    res = get_state(env)
    print(res)
    
    env.close()
    

    Result:

    [['X' 'O' ' ']
     ['O' 'X' ' ']
     ['X' ' ' ' ']]