I am new to deep learning and CNNs. If a CNN has been created as shown in the screenshot, then how can one explain the outputs as described by model.summary()
. I am not able to understand the output shapes of different layers.
Model summary:
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_14 (Conv2D) (None, 29, 29, 32) 1568
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 14, 14, 32) 0
_________________________________________________________________
conv2d_15 (Conv2D) (None, 11, 11, 32) 16416
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 5, 5, 32) 0
_________________________________________________________________
flatten_3 (Flatten) (None, 800) 0
_________________________________________________________________
dense_6 (Dense) (None, 32) 25632
_________________________________________________________________
dense_7 (Dense) (None, 10) 330
=================================================================
Total params: 43,946
Trainable params: 43,946
Non-trainable params: 0
Assume that the size of each image is (32, 32, 3)
, as per the question.
Keras then appends an extra dimension for processing multiple batches, i.e., to train multiple images in every step of a single epoch. Since batch size can vary, its size is represented by None. Hence, the input shape becomes (None, 32, 32, 3)
.
Convolving a (32, 32)
image with a (4, 4)
filter, with strides and dilation rate of 1, and 'valid' padding, results in an output of size (32 - 4 + 1, 32 - 4 + 1) = (29, 29)
. Since you have 32 such filters, the output shape becomes (29, 29, 32)
.
The default MaxPooling kernel has a shape of (2, 2)
and strides of (2, 2)
. Applying that to a (29, 29)
image results in an image of shape (((29 - 2)//2) + 1, ((29 - 2)//2) + 1)) = (14, 14)
.
This pattern can be extended to all Conv2D and MaxPooling layers.
The Flatten
layer takes all pixels along all channels and creates a 1D vector (not considering batch size). Therefore, an input of (5, 5, 32)
is flattened to (5 * 5 * 32) = 800
values.
Parameter count
The number of parameters for a Conv2D layer is given by:
(kernel_height * kernel_width * input_channels * output_channels) + (output_channels if bias is used).
So, for the first Conv2D layer with 3 input channels, 32 output channels and a kernel size of (4, 4)
, the number of parameters is (4 * 4 * 3 * 32) + 32 = 1568
.