pythontensorflowneural-networkconv-neural-networkconvolution

How should I determine the input size for layers following the initial layer in a CNN?


I am working on CS50AI unit 5, and this is the code from the number recognition part of the lecture. If I wanted to add another convolutional layer after the max pooling, how would I determine the input shape? Would it be IMG_WIDTH, IMG_HEIGHT, 3 or would I divide IMG_WIDTH and IMG_HEIGHT by 2 because of the max pooling?

Similarly, how is number of nodes in the first dense layer (128) determined? Is that an arbitrary number that I can decide or is it based on something else?

model = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(
            32, (5, 5), activation="relu", input_shape=(IMG_WIDTH, IMG_HEIGHT, 3)
        ),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation="relu"),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(NUM_CATEGORIES, activation="softmax")
    ])

For reference

IMG_WIDTH = 30
IMG_HEIGHT = 30
NUM_CATEGORIES = 3

Solution

  • Actually you don't need to determine image size for this specific code and keras will itself find out the input size based on input size of the whole network and previous layers.

    But if you wish to know what is the input shape for the layer, after the first convolution layer the input shape will be (IMAGE_WIDTH - 4, IMAGE_HEIGHT - 4, 32) because you have 32 channels and used kernel size of 5. And after the pooling layer the height and width will be divided by two as you mentioned.

    And the number of nodes in the dense layer can be determined arbitrarily.