[SOLVED] Reshaping an Gym array for TensorFlow

Reshaping an Gym array for TensorFlow

I am learning how to use Gym environments to train deep learning models built with TFLearn.

At the moment my array of observations has the following shape: (210, 160, 3)

Any recommendations on what is the best way to reshape this array so it can be used in a TensorFlow classification model?

Solution

The standard way (as described in the DQN paper by DeepMing) would be:

Convert it to the Gray-Scale,so you end-up with (210,160). Here you better check that no useful information is lost, in some games this can happen. (For example a ball became the same 'color' as a background). Here you can use something like:

processed = np.mean(frame,2,keepdims = False)
Downsample to (110,84). Here you can use OpenCV or any other convenient library:

resized = cv2.resize(processed, (110,84),interpolation = cv2.CV_INTER_LINEAR)
Crop central part of the screen (84,84)

result = resized[13:97]

Although, this was described in the DeepMind paper, you can use more convenient, but still effective procedures, like:

Convert to gray scale:

processed = np.mean(frame,2,keepdims = False)
Crop central part:

cropped = processed[35:195]
Downsample by factor of 2 and get (80,80) image:

result = cropped[::2,::2]

Similar approach was used by Andrej Karpathy in his blog

You also can use other dimension sizes, convert to a binary image instead of gray-scale or anything that works better for your application.

Once you have processed the image you can feed it to a convolutional network or flatten to a 1-D array and feed to a fully-connected network.

It's also useful to use a stack of several frames (usually 4) as a network input, together with skip-frames (after taking a frame you skip next 3 frames) as they usually do not contain much useful information.