I have an autoencoder and I checked the accuracy of my model with different solutions like changing the number of conv layer and increase them, add or remove Batch Normalization, change the activation function, but the accuracy for all of them is similar and it does not have any improvement that is weird. I'm confused because I think it the accuracy for these different solutions should be different but it is 0.8156. could you please help me what is the problem? I also train it with 10000 epochs but the output is the same for 50 epochs! do my code is wrong or it can not become better?! the accuracy graph
I am also not sure the learning rate decay work or not?! I put my code here too:
from keras.layers import Input, Concatenate, GaussianNoise,Dropout,BatchNormalization
from keras.layers import Conv2D
from keras.models import Model
from keras.datasets import mnist,cifar10
from keras.callbacks import TensorBoard
from keras import backend as K
from keras import layers
import matplotlib.pyplot as plt
import tensorflow as tf
import keras as Kr
from keras.callbacks import ReduceLROnPlateau
from keras.callbacks import EarlyStopping
import numpy as np
import pylab as pl
import matplotlib.cm as cm
import keract
from matplotlib import pyplot
from keras import optimizers
from keras import regularizers
from tensorflow.python.keras.layers import Lambda;
image = Input((28, 28, 1))
conv1 = Conv2D(16, (3, 3), activation='elu', padding='same', name='convl1e')(image)
conv2 = Conv2D(32, (3, 3), activation='elu', padding='same', name='convl2e')(conv1)
conv3 = Conv2D(16, (3, 3), activation='elu', padding='same', name='convl3e')(conv2)
#conv3 = Conv2D(8, (3, 3), activation='relu', padding='same', name='convl3e', kernel_initializer='Orthogonal',bias_initializer='glorot_uniform')(conv2)
BN=BatchNormalization()(conv3)
#DrO1=Dropout(0.25,name='Dro1')(conv3)
DrO1=Dropout(0.25,name='Dro1')(BN)
encoded = Conv2D(1, (3, 3), activation='elu', padding='same',name='encoded_I')(DrO1)
#-----------------------decoder------------------------------------------------
#------------------------------------------------------------------------------
deconv1 = Conv2D(16, (3, 3), activation='elu', padding='same', name='convl1d')(encoded)
deconv2 = Conv2D(32, (3, 3), activation='elu', padding='same', name='convl2d')(deconv1)
deconv3 = Conv2D(16, (3, 3), activation='elu',padding='same', name='convl3d')(deconv2)
BNd=BatchNormalization()(deconv3)
DrO2=Dropout(0.25,name='DrO2')(BNd)
#DrO2=Dropout(0.5,name='DrO2')(deconv3)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same', name='decoder_output')(DrO2)
#model=Model(inputs=[image,wtm],outputs=decoded)
#--------------------------------adding noise----------------------------------
#decoded_noise = GaussianNoise(0.5)(decoded)
watermark_extraction=Model(inputs=image,outputs=decoded)
watermark_extraction.summary()
#----------------------training the model--------------------------------------
#------------------------------------------------------------------------------
#----------------------Data preparation----------------------------------------
(x_train, _), (x_test, _) = mnist.load_data()
x_validation=x_train[1:10000,:,:]
x_train=x_train[10001:60000,:,:]
#(x_train, _), (x_test, _) = cifar10.load_data()
#x_validation=x_train[1:10000,:,:]
#x_train=x_train[10001:60000,:,:]
#
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_validation = x_validation.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1)) # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1)) # adapt this if using `channels_first` image data format
x_validation = np.reshape(x_validation, (len(x_validation), 28, 28, 1))
#---------------------compile and train the model------------------------------
# is accuracy sensible metric for this model?
learning_rate = 0.1
decay_rate = learning_rate / 50
opt = optimizers.SGD(lr=learning_rate, momentum=0.9, decay=decay_rate, nesterov=False)
watermark_extraction.compile(optimizer=opt, loss=['mse'], metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=20)
#rlrp = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, min_delta=1E-7, verbose=1)
history=watermark_extraction.fit(x_train, x_train,
epochs=50,
batch_size=32,
validation_data=(x_validation, x_validation),
callbacks=[TensorBoard(log_dir='E:/output of tensorboard', histogram_freq=0, write_graph=False),es])
watermark_extraction.summary()
#--------------------visuallize the output layers------------------------------
#_, train_acc = watermark_extraction.evaluate(x_train, x_train)
#_, test_acc = watermark_extraction.evaluate([x_test[5000:5001],wt_expand], [x_test[5000:5001],wt_expand])
#print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
## plot loss learning curves
pyplot.subplot(211)
pyplot.title('MSE Loss', pad=-40)
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='validation')
pyplot.legend()
pyplot.subplot(212)
pyplot.title('Accuracy', pad=-40)
pyplot.plot(history.history['acc'], label='train')
pyplot.plot(history.history['val_acc'], label='test')
pyplot.legend()
pyplot.show
Since you stated that you are a beginner, I am going to try to build from the bottom up and try to rope in your code with that explanation as much as possible.
Part 1 Autoencoders are made up of two parts (Encoders and Decoders). Autoencoders decrease the number of variables required to store the Information, and Decoders try to get this information back from the compressed form. (Note that autoencoders are not used in real data compression tasks, due to their uncertainty and their data dependent nature).
Now In your code you keep the padding
as same.
conv1 = Conv2D(16, (3, 3), activation='elu', padding='same', name='convl1e')(image)
This basically takes away the compression and expansion feature of the autoencoders, i.e. in each step you are using the same number of variables to represent the information.
Part 2 Now moving on to you training the algorithm
history=watermark_extraction.fit(x_train, x_train,
epochs=50,
batch_size=32,
validation_data=(x_validation, x_validation),
callbacks=[TensorBoard(log_dir='E:/PhD/thesis/deepwatermark/journal code/autoencoder_watermark/11-2-2019/output of tensorboard', histogram_freq=0, write_graph=False),es])
From this expression/statement/line of code I come to the conclusion that you want to generate back the same Image that you put in your code, Now since the Image is stored in same number of variables your model just has to pass on the same Image to each step without changing anything in the Image, this incentivizes your model to optimize each Filter Parameter to 1.
Part 3 Now comes the biggest nail in the coffin, you have Implemented a dropout layer, first you should NEVER Implement dropout in the convolutional layer. This Link explains why and It discusses various ideas that I think if you are a beginner you should check out. Now let's see why the way you have used Dropout is really bad. As already explained, the best fit for your model would be all parameters in the filters learning the value 1. Now what happens here is that you have forced some of those Filters to turn off, which does nothing other than switching off some filters as discussed in the article, All this does is decreases the Intensity of your Images in the next layer.(Since CNN filters take average over all Input channels)
DrO2=Dropout(0.25,name='DrO2')(BNd)
Part 4 This is just a bit of advice and not something that will be a source of any problem
BNd=BatchNormalization()(deconv3)
Here you have tried to normalize the data over the batch, Data normalization is extremely important in most cases, as you might know it doesn't let one feature dictate the model and each feature get an equal say in the model, but in Image data every point is already scaled between zero and 255, so using normalization to scale it between 0 and 1 adds no value, just adds unnecessary computation to the model.
I would suggest you to understand part by part and if something is unclear, comment below, try not to make this about autoencoders using CNN (They don't have any real application anyway), but rather use it to understand various Intricacies of ConvNets(CNN), The reason I have chosen to write an answer like this explaining parts of your network and not the code is because the code for what you are looking for is just a google search away, If you are intrigued by this answer and want to know how exactly CNN's work, check this out https://www.youtube.com/watch?v=ArPaAX_PhIs&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF, If you have any doubts and on anything in this answer or even on those videos comment below.