machine-learningkerasdeep-learningautoencoder

Convolutional Autoencoders


I am writing a code for running autoencoder on CIFAR10 dataset and see the reconstructed images.

The requirement is to create

Encoder with First Layer

Encoder with Second Layer


Decoder with First Layer

Decoder with Second Layer


I understand that

  1. When we are creating Convolutional Autoencoder (or any AE), we need to pass the output of the previous layer to the next layer.
  2. So, when I create the first Conv2D layer with ReLu and then perform BatchNormalization .. in which I pass the Conv2D layer .. right?
  3. But when I do MaxPooling2D .. what should I pass .. BatchNormalization output or Conv2D layer output?

Also, is there any order in which I should be performing these operations?


I am attaching my code below ... I have attempted it to two different ways and hence getting different outputs (in terms of model summary and also model training graph)

Can someone please help me by explaining which is the correct method (Method-1 or Method-2)? Also, how do I understand which graph shows better model performance?

Method - 1

input_image = Input(shape=(32, 32, 3))

### Encoder
conv1_1 = Conv2D(64, (3, 3), activation='relu', padding='same')(input_image)
bnorm1_1 = BatchNormalization()(conv1_1)
mpool1_1 = MaxPooling2D((2, 2), padding='same')(conv1_1)

conv1_2 = Conv2D(16, (3, 3), activation='relu', padding='same')(mpool1_1)
borm1_2 = BatchNormalization()(conv1_2)

encoder = MaxPooling2D((2, 2), padding='same')(conv1_2)

### Decoder
conv2_1 = Conv2D(16, (3, 3), activation='relu', padding='same')(encoder)
bnorm2_1 = BatchNormalization()(conv2_1)
up1_1 = UpSampling2D((2, 2))(conv2_1)

conv2_2 = Conv2D(32, (3, 3), activation='relu', padding='same')(up1_1)
bnorm2_2 = BatchNormalization()(conv2_2)
up2_1 = UpSampling2D((2, 2))(conv2_2)

decoder = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(up2_1)

model = Model(input_image, decoder)
model.compile(optimizer='adam', loss='binary_crossentropy')
model.summary()

history = model.fit(trainX, trainX, 
                    epochs=50, 
                    batch_size=1000, 
                    shuffle=True,
                    verbose=2,
                    validation_data=(testX, testX)
                    )

As an output of the model summary, I get this

Total params: 18,851

Trainable params: 18,851

Non-trainable params: 0

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper right')
plt.show()

Model Loss Graph - 1


Method - 2

input_image = Input(shape=(32, 32, 3))

### Encoder
x = Conv2D(64, (3, 3), activation='relu', padding='same')(input_image)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2), padding='same')(x)

x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)

encoder = MaxPooling2D((2, 2), padding='same')(x)

### Decoder

x = Conv2D(16, (3, 3), activation='relu', padding='same')(encoder)
x = BatchNormalization()(x)
x = UpSampling2D((2, 2))(x)

x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = UpSampling2D((2, 2))(x)

decoder = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)

model = Model(input_image, decoder)
model.compile(optimizer='adam', loss='binary_crossentropy')
model.summary()

history = model.fit(trainX, trainX, 
                        epochs=50, 
                        batch_size=1000, 
                        shuffle=True,
                        verbose=2,
                        validation_data=(testX, testX)
                        )

As an output of the model summary, I get this

Total params: 19,363

Trainable params: 19,107

Non-trainable params: 256

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper right')
plt.show()

Model Loss Graph - 2


Solution

  • In method 1, BatchNormalization layers does not exist in the compiled model, as the output of these layers are not used anywhere. You can check this by running model1.summary()

    Method 2 is perfectly alright.

    Order of the operations : Conv2D --> BatchNormalization --> MaxPooling2D is usually the common approach. Though either order would work since, since BatchNorm is just mean and variance normalization.

    Edit:

    For Conv2D --> BatchNormalization --> MaxPooling2D :

    conv1_1 = Conv2D(64, (3, 3), activation='relu', padding='same')(input_image) bnorm1_1 = BatchNormalization()(conv1_1) mpool1_1 = MaxPooling2D((2, 2), padding='same')(bnorm1_1) and then use mpool1_1 as input for next layer.

    For Conv2D --> MaxPooling2D --> BatchNormalization:

    conv1_1 = Conv2D(64, (3, 3), activation='relu', padding='same')(input_image) mpool1_1 = MaxPooling2D((2, 2), padding='same')(conv1_1) bnorm1_1 = BatchNormalization()(mpool1_1) and then use bnorm1_1 as input for next layer.