tensorflowkerasdeep-learninggoogle-colaboratoryconv-neural-network

How to interpret model.summary() output in CNN?


I am new to deep learning and CNNs. If a CNN has been created as shown in the screenshot, then how can one explain the outputs as described by model.summary(). I am not able to understand the output shapes of different layers.

Model summary:

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_14 (Conv2D)           (None, 29, 29, 32)        1568      
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 11, 11, 32)        16416     
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 5, 5, 32)          0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 800)               0         
_________________________________________________________________
dense_6 (Dense)              (None, 32)                25632     
_________________________________________________________________
dense_7 (Dense)              (None, 10)                330       
=================================================================
Total params: 43,946
Trainable params: 43,946
Non-trainable params: 0

neural network design


Solution

  • Assume that the size of each image is (32, 32, 3), as per the question.

    Keras then appends an extra dimension for processing multiple batches, i.e., to train multiple images in every step of a single epoch. Since batch size can vary, its size is represented by None. Hence, the input shape becomes (None, 32, 32, 3).

    Convolving a (32, 32) image with a (4, 4) filter, with strides and dilation rate of 1, and 'valid' padding, results in an output of size (32 - 4 + 1, 32 - 4 + 1) = (29, 29). Since you have 32 such filters, the output shape becomes (29, 29, 32).

    The default MaxPooling kernel has a shape of (2, 2) and strides of (2, 2). Applying that to a (29, 29) image results in an image of shape (((29 - 2)//2) + 1, ((29 - 2)//2) + 1)) = (14, 14).

    This pattern can be extended to all Conv2D and MaxPooling layers.

    The Flatten layer takes all pixels along all channels and creates a 1D vector (not considering batch size). Therefore, an input of (5, 5, 32) is flattened to (5 * 5 * 32) = 800 values.

    Parameter count

    The number of parameters for a Conv2D layer is given by:

    (kernel_height * kernel_width * input_channels * output_channels) + (output_channels if bias is used).

    So, for the first Conv2D layer with 3 input channels, 32 output channels and a kernel size of (4, 4), the number of parameters is (4 * 4 * 3 * 32) + 32 = 1568.