pythontensorflowreinforcement-learningtflearnopenai-gym

Reshaping an Gym array for TensorFlow


I am learning how to use Gym environments to train deep learning models built with TFLearn.

At the moment my array of observations has the following shape: (210, 160, 3)

Any recommendations on what is the best way to reshape this array so it can be used in a TensorFlow classification model?


Solution

  • The standard way (as described in the DQN paper by DeepMing) would be:

    1. Convert it to the Gray-Scale,so you end-up with (210,160). Here you better check that no useful information is lost, in some games this can happen. (For example a ball became the same 'color' as a background). Here you can use something like:

      processed = np.mean(frame,2,keepdims = False)

    2. Downsample to (110,84). Here you can use OpenCV or any other convenient library:

      resized = cv2.resize(processed, (110,84),interpolation = cv2.CV_INTER_LINEAR)

    3. Crop central part of the screen (84,84)

      result = resized[13:97]

    Although, this was described in the DeepMind paper, you can use more convenient, but still effective procedures, like:

    1. Convert to gray scale:

      processed = np.mean(frame,2,keepdims = False)

    2. Crop central part:

      cropped = processed[35:195]

    3. Downsample by factor of 2 and get (80,80) image:

      result = cropped[::2,::2]

    Similar approach was used by Andrej Karpathy in his blog

    You also can use other dimension sizes, convert to a binary image instead of gray-scale or anything that works better for your application.

    Once you have processed the image you can feed it to a convolutional network or flatten to a 1-D array and feed to a fully-connected network.

    It's also useful to use a stack of several frames (usually 4) as a network input, together with skip-frames (after taking a frame you skip next 3 frames) as they usually do not contain much useful information.