I am working on a comparison of the fitting accuracy results for the different types of data quality. A "good data" is the data without any NA in the feature values. A "bad data" is the data with NA in the feature values. A "bad data" should be fixed by some value correction. As a value correction, it might be replacing NA with zero or mean value.
In my code, I am trying to perform multiple fitting procedures.
Review the simplified code:
from keras import backend as K
...
xTrainGood = ... # the good version of the xTrain data
xTrainBad = ... # the bad version of the xTrain data
...
model = Sequential()
model.add(...)
...
historyGood = model.fit(..., xTrainGood, ...) # fitting the model with
# the original data without
# NA, zeroes, or the feature mean values
Review the fitting accuracy plot, based on historyGood
data:
After that, the code resets a stored the model and re-train the model with the "bad" data:
K.clear_session()
historyBad = model.fit(..., xTrainBad, ...)
Review the fitting process results, based on historyBad
data:
As one can notice, the initial accuracy > 0.7
, which means the model "remembers" previous fitting.
For the comparison, this is the standalone fitting results of "bad" data:
How to reset the model to the "initial" state?
K.clear_session()
isn't enough to reset states and ensure reproducibility. You'll also need to:
Code accomplishing each below.
reset_seeds()
model = make_model() # example function to instantiate model
model.fit(x_good, y_good)
del model
K.clear_session()
tf.compat.v1.reset_default_graph()
reset_seeds()
model = make_model()
model.fit(x_bad, y_bad)
Note that if other variables reference the model, you should del
them also - e.g. model = make_model(); model2 = model
--> del model, model2
- else they may persist. Lastly, tf
random seeds aren't as easily reset as random
's or numpy
's, and require the graph to be cleared beforehand.
import tensorflow as tf
import numpy as np
import random
import keras.backend as K
def reset_seeds():
np.random.seed(1)
random.seed(2)
if tf.__version__[0] == '2':
tf.random.set_seed(3)
else:
tf.set_random_seed(3)
print("RANDOM SEEDS RESET")