I'm trying to classify colored MNIST digits with a basic CNN architecture on Keras. Here is the piece of code that colors the original dataset into purely either red, green or blue.
def load_norm_data():
## load basic mnist
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
train_images = np.zeros((*x_train.shape, 3)) # orig shape: (60 000, 28, 28, 1) -> rgb shape: (60 000, 28, 28, 3)
for num in range(x_train.shape[0]):
rgb = np.random.randint(3)
train_images[num, ..., rgb] = x_train[num]/255
return train_images, y_train
if __name__ == '__main__':
ims, labels = load_norm_data()
for num in range(10):
plt.subplot(2, 5, num+1)
plt.imshow(ims[num])
plt.axis('off')
which gives for the first couple of digits:
Then, I attempt to classify this colored dataset into the same 10 digit classes of MNIST, so the labels aren't changing --and yet the models accuracy drops from 95% for non-colored MNIST, to wildly variable 30-70% on colored MNIST, vastly depending on weight initialization... Please find below the architecture of said model:
model = keras.Sequential()
model.add(keras.layers.Conv2D(64, kernel_size=(3,3), padding='same'))
model.add(keras.layers.MaxPool2D(pool_size=(2,2)))
model.add(keras.layers.Conv2D(64, kernel_size=(3,3), padding='same'))
model.add(keras.layers.MaxPool2D(pool_size=(2,2), padding='same'))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(10, activation='relu'))
model.add(keras.layers.Softmax())
input_shape = train_images.shape
model.build(input_shape)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
model.fit(train_images, train_numbers, batch_size=12, epochs=25)
Initially, I thought that this drop in performance might be linked to data irregularity (e.g. imagine a lot of 3s in the data ended up being green, thus the model learns green = 3). So I checked the data, the counts are good and the rgb distribution for each class is near 33% for each color too. I also checked the misclassified images to see if there were many representatives of a certain color or digit, but it doesn't seem to be the case either. In any case, after reading Keras' documentation and because of the fact that Conv2D
forces you to pass it a 2-dimensional kernel_size
that I imagine thus operates on all channels of the input image, the model shouldn't be taking color into account for classification here.
Am I missing something here?
The last part of the model includes a dense -> relu -> softmax. The relu activation should be removed. In addition, you might benefit from adding non-linearities (e.g., relu) in your convolutional blocks. Otherwise, the neural network will end up being a (big) linear function and will not work as well for non-linear data.
model = keras.Sequential()
model.add(keras.layers.Conv2D(64, kernel_size=(3,3), padding='same', activation='relu'))
model.add(keras.layers.MaxPool2D(pool_size=(2,2)))
model.add(keras.layers.Conv2D(64, kernel_size=(3,3), padding='same', activation='relu'))
model.add(keras.layers.MaxPool2D(pool_size=(2,2), padding='same'))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(10))
model.add(keras.layers.Softmax())
It is interesting that the original model worked well on the MNIST dataset. I cannot say for sure why, but perhaps the MNIST dataset is simple enough that the model was able to cope. Also, the relu -> softmax
would clamp negative values to 0, and maybe there were not many negative values.