pythontensorflowkerastf.keraskeras-2

The clear_session() method of keras.backend does not clean up the fitting data


I am working on a comparison of the fitting accuracy results for the different types of data quality. A "good data" is the data without any NA in the feature values. A "bad data" is the data with NA in the feature values. A "bad data" should be fixed by some value correction. As a value correction, it might be replacing NA with zero or mean value.

In my code, I am trying to perform multiple fitting procedures.

Review the simplified code:

from keras import backend as K
...

xTrainGood = ... # the good version of the xTrain data 

xTrainBad = ... #  the bad version of the xTrain data

...

model = Sequential()

model.add(...)

...

historyGood = model.fit(..., xTrainGood, ...) # fitting the model with 
                                              # the original data without
                                              # NA, zeroes, or the feature mean values

Review the fitting accuracy plot, based on historyGood data:

enter image description here

After that, the code resets a stored the model and re-train the model with the "bad" data:

K.clear_session()

historyBad = model.fit(..., xTrainBad, ...)

Review the fitting process results, based on historyBad data:

enter image description here

As one can notice, the initial accuracy > 0.7, which means the model "remembers" previous fitting.

For the comparison, this is the standalone fitting results of "bad" data:

enter image description here

How to reset the model to the "initial" state?


Solution

  • K.clear_session() isn't enough to reset states and ensure reproducibility. You'll also need to:

    Code accomplishing each below.

    reset_seeds()
    model = make_model() # example function to instantiate model
    model.fit(x_good, y_good)
    
    del model
    K.clear_session()
    tf.compat.v1.reset_default_graph()
    
    reset_seeds()
    model = make_model()
    model.fit(x_bad, y_bad)
    

    Note that if other variables reference the model, you should del them also - e.g. model = make_model(); model2 = model --> del model, model2 - else they may persist. Lastly, tf random seeds aren't as easily reset as random's or numpy's, and require the graph to be cleared beforehand.


    Function/modules used:

    import tensorflow as tf
    import numpy as np
    import random
    import keras.backend as K
    
    def reset_seeds():
        np.random.seed(1)
        random.seed(2)
        if tf.__version__[0] == '2':
            tf.random.set_seed(3)
        else:
            tf.set_random_seed(3)
        print("RANDOM SEEDS RESET")