I am learning how to use Gym environments to train deep learning models built with TFLearn.
At the moment my array of observations has the following shape: (210, 160, 3)
Any recommendations on what is the best way to reshape this array so it can be used in a TensorFlow classification model?
The standard way (as described in the DQN paper by DeepMing) would be:
Convert it to the Gray-Scale,so you end-up with (210,160). Here you better check that no useful information is lost, in some games this can happen. (For example a ball became the same 'color' as a background). Here you can use something like:
processed = np.mean(frame,2,keepdims = False)
Downsample to (110,84). Here you can use OpenCV or any other convenient library:
resized = cv2.resize(processed, (110,84),interpolation = cv2.CV_INTER_LINEAR)
Crop central part of the screen (84,84)
result = resized[13:97]
Although, this was described in the DeepMind paper, you can use more convenient, but still effective procedures, like:
Convert to gray scale:
processed = np.mean(frame,2,keepdims = False)
Crop central part:
cropped = processed[35:195]
Downsample by factor of 2 and get (80,80) image:
result = cropped[::2,::2]
Similar approach was used by Andrej Karpathy in his blog
You also can use other dimension sizes, convert to a binary image instead of gray-scale or anything that works better for your application.
Once you have processed the image you can feed it to a convolutional network or flatten to a 1-D array and feed to a fully-connected network.
It's also useful to use a stack of several frames (usually 4) as a network input, together with skip-frames (after taking a frame you skip next 3 frames) as they usually do not contain much useful information.