pythontensorflowkeras

Running model.fit multiple times without reinstantiating the model


Background

I am watching a popular YouTube crash course on machine learning.

At 3:35:50, he mentions that the model is likely overfit, so fits it again with less epochs.

Since he didn't reinstantiate the model, isn't this equivalent to fitting the model with that same data, thereby continuing to overtrain it?

My Question

Assume you have a model created and data ready to go.

You run:

model.fit(train_images, train_labels, epochs=10)
model.fit(train_images, train_labels, epochs=8)

Is this equivalent to running:

model.fit(train_images, train_labels, epochs=18)

Or:

model.fit(train_images, train_labels, epochs=8)

If previously fitted data is overwritten, why does running model.fit a second time begin with the accuracy of the previous model?

In multiple other questions regarding saving and training models, the accepted solutions are to load the previously trained model, and run model.fit again.

If this will overwrite the pre-existing weights, doesn't that defeat the purpose of saving the model in the first place? Wouldn't training the model for the first time on the new data be equivalent?

What is the appropriate way to train a model across multiple, similar datasets while retaining accuracy across all of the data?


Solution

  • Since he didn't reinstantiate the model, isn't this equivalent to fitting the model with that same data, thereby continuing to overtrain it?

    You are correct! In order to check which number of epochs would do better in his example, he should have compiled the network again (that is, execute the above cell again).

    Just remember that in general, whenever you instantiate a model again it most likely will start with completely new weights, totally different from past weights (unless you change this manually). So even though you keep the same amount of epochs, your final accuracy can change depending on the initial weights.

    Are these two commands equivalent?

    model.fit(train_images, train_labels, epochs=10)
    model.fit(train_images, train_labels, epochs=8)
    

    and

    model.fit(train_images, train_labels, epochs=18)
    

    No.

    In the first case, you are training your network with some weights X going through all your training set 10 times, then you update your weights for some value y. Then you will train your network again though all your training set 8 times but now you are using a network with weights X+y.

    For the second case, you will train your network through all your training data 18 times with the weights X.

    This is different!