python tensorflow conv-neural-network padding max-pooling

Accuracy impact when not using stride and padding for maxpooling layers in a CNN

I'm testing parameters out for my CNN that aims at classifying images according to three classes. Images are 224x224, the CNN architecture is very basic and consists of 3 convolutional layers (32, 32 and 64 filters) with ReLU activation after which there is a maxpooling layer. After the 3 convolution layers, there's a FC layer with 256 cells and a 3 neuron FC with Softmax activation. After some tests, I realized training and testing precision remained constant, around 33% (uniform distribution so it's pure guess at this point) no matter how hyperparameters change, but then I noticed that I forgot to add padding same and a strides for the two last max pooling layers. When correcting that, results are pretty good (around 93% accuracy) and there is no overfitting.

I wonder why forgetting to add padding=same and strides=2 leads to such a situation where the network bases its predictions on pure guessing. I looked for this kind of issue but I didn't find out any explanation. How could the features size that remain even after convolution and failing to take edges information into account lead to poor accuracy? Also I'm using Tensorflow and keras.

Thank you very much.

Solution

There are two issues arises with convolution:

Every time after convolution operation, original image size getting shrinks:

#!/usr/bin/python
# import necessary modules
from keras.models import Sequential
from keras.layers import Conv2D

model = Sequential()
model.add(Conv2D(1, (3,3), strides=(2, 2), input_shape=(5, 5, 1)))
model.summary()

as we have seen in above example, in image classification task there are multiple convolution layers so after multiple convolution operation, our original image will really get small.

The second issue is that, when kernel moves over original images, it touches the edge of the image less number of times and touches the middle of the image more number of times and it overlaps also in the middle. So, the corner features of any image or on the edges aren't used much in the output.

So, in order to solve these two issues, a new concept is introduced called padding. Padding preserves the size of the original image.

So if a 𝑛∗𝑛 matrix convolved with an f*f matrix the with padding p then the size of the output image will be (n + 2p — f + 1) * (n + 2p — f + 1) where p =1 In this case. So if a 𝑛∗𝑛 matrix convolved with an f*f matrix the with padding p then the size of the output image will be (n + 2p — f + 1) * (n + 2p — f + 1) where p =1 In this case.

Stride

left image: stride=0, middle image: stride=1, right image: stride=2. Stride is the number of pixel shifts over the input matrix. For padding p, filter size 𝑓∗𝑓 and input image size 𝑛 ∗ 𝑛 and stride '𝑠' our output image dimension will be [ {(𝑛 + 2𝑝 − 𝑓 + 1) / 𝑠} + 1] ∗ [ {(𝑛 + 2𝑝 − 𝑓 + 1) / 𝑠} + 1].

model.add(Conv2D(1, (3,3), strides=(2, 2), input_shape=(5, 5, 1)))
model.summary()
model.set_weights(weights)
yhat = model.predict(data)
for r in range(yhat.shape[1]):
    print([yhat[0,r,c,0] for c in range(yhat.shape[2])])
Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_3 (Conv2D)            (None, 2, 2, 1)           10        
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
[12.0, 17.0]
[9.0, 14.0]

Pooling

A pooling layer is another building block of a CNN. Pooling Its function is to gradually reduce the spatial size of the representation to reduce the network complexity and computational cost.

Average Pooling

from keras.layers import AveragePooling2D
model = Sequential()
model.add(Conv2D(1, (3,3), padding='same', input_shape=(5, 5, 1)))
model.add(AveragePooling2D((2,2)))
model.summary()
model.set_weights(weights)
yhat = model.predict(data)
for r in range(yhat.shape[1]):
    print([yhat[0,r,c,0] for c in range(yhat.shape[2])])
Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_7 (Conv2D)            (None, 5, 5, 1)           10        
_________________________________________________________________
average_pooling2d_1 (Average (None, 2, 2, 1)           0         
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
[11.5, 14.25]
[9.5, 14.0]
from keras.layers import Flatten
model = Sequential()
model.add(Conv2D(1, (3,3), padding='same', input_shape=(5, 5, 1)))
model.add(AveragePooling2D((2,2)))
model.add(Flatten())
model.summary()
Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_8 (Conv2D)            (None, 5, 5, 1)           10        
_________________________________________________________________
average_pooling2d_2 (Average (None, 2, 2, 1)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 4)                 0         
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_______________________