pythonkerastf.kerasearly-stopping

Why doesn't restore_best_weights=True update results?


I found that restore_best_weights=True does not actually restore the best behavior. A simplified example with some dummy data:

import numpy as np
from tensorflow.keras.utils import set_random_seed
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.callbacks import EarlyStopping

np.random.seed(1)
set_random_seed(2)

x = np.array([1., 2., 3., 4., 5.])
y = np.array([1., 3., 4., 2., 5.])

model = Sequential()
model.add(Dense(2, input_shape=(1, ), activation='tanh'))
model.add(Dense(4,                    activation='relu'))
model.add(Dense(1))

model.compile(optimizer=RMSprop(learning_rate=0.1), loss='mse')
stopmon = EarlyStopping(monitor='loss', patience=2, restore_best_weights=True, verbose=1)
history = model.fit(x, y, epochs=100, verbose=2, callbacks=[stopmon])
res = model.evaluate(x, y, verbose=1)
print(f'best={stopmon.best:.4f}, loss={res:.4f}')

The output (on my system) is:

Epoch 1/100
1/1 - 0s - loss: 11.8290 - 434ms/epoch - 434ms/step
Epoch 2/100
1/1 - 0s - loss: 1.9091 - 0s/epoch - 0s/step
Epoch 3/100
1/1 - 0s - loss: 1.5159 - 16ms/epoch - 16ms/step
Epoch 4/100
1/1 - 0s - loss: 1.3921 - 0s/epoch - 0s/step
Epoch 5/100
1/1 - 0s - loss: 1.6787 - 0s/epoch - 0s/step
Epoch 6/100
Restoring model weights from the end of the best epoch: 4.
1/1 - 0s - loss: 2.0629 - 33ms/epoch - 33ms/step
Epoch 6: early stopping
1/1 [==============================] - 0s 100ms/step - loss: 1.6787
best=1.3921, loss=1.6787

It looks like the weights are set to those from epoch 4. Then why does the loss still evaluate to the higher value from epoch 6? Is there anything extra I should do to update the model or something?

I use an up-to-date TensorFlow (version 2.12.0) on Windows x64 (Intel), tf.version.COMPILER_VERSION == 'MSVC 192930140'.


Solution

  • I think it has something to do with the loss calculation for the training loss.
    But it still works, at least for the val_loss.
    I have done 2 tests

    1_Without validation:

    np.random.seed(1)
    set_random_seed(2)
    
    x = np.random.randn(1000)
    y = np.random.randn(1000)
    
    model = Sequential()
    model.add(Dense(2, input_shape=(1,), activation='tanh'))
    model.add(Dense(4, activation='relu'))
    model.add(Dense(1))
    
    model.compile(optimizer=RMSprop(learning_rate=0.1), loss='mse')
    stopmon = EarlyStopping(monitor='loss', patience=2, restore_best_weights=True,    verbose=1)
    history = model.fit(x, y, epochs=100, verbose=2, callbacks=[stopmon])
    res = model.evaluate(x, y, verbose=1)
    print(f'best={stopmon.best:.4f}, loss={res:.4f}')
    

    The output is:

    Epoch 1/100
    32/32 - 0s - loss: 0.9681 - 468ms/epoch - 15ms/step
    Epoch 2/100
    32/32 - 0s - loss: 0.9515 - 33ms/epoch - 1ms/step
    Epoch 3/100
    32/32 - 0s - loss: 0.9675 - 30ms/epoch - 953us/step
    Epoch 4/100
    Restoring model weights from the end of the best epoch: 2.
    32/32 - 0s - loss: 0.9596 - 37ms/epoch - 1ms/step
    Epoch 4: early stopping
    32/32 [==============================] - 0s 952us/step - loss: 1.0256
    best=0.9515, loss=1.0256
    

    You can see, its very strange that the loss is higher than the rest. Maybe due how the calculate the loss in the training step.

    2_With validation step.

    np.random.seed(1)
    set_random_seed(2)
    
    x = np.random.randn(1000)
    y = np.random.randn(1000)
    
    x2 = np.random.randn(50)
    y2 = np.random.randn(50)
    
    model = Sequential()
    model.add(Dense(2, input_shape=(1,), activation='tanh'))
    model.add(Dense(4, activation='relu'))
    model.add(Dense(1))
    
    model.compile(optimizer=RMSprop(learning_rate=0.1), loss='mse')
    stopmon = EarlyStopping(monitor='val_loss', patience=2, restore_best_weights=True, verbose=1)
    history = model.fit(x, y, epochs=100, verbose=2, callbacks=[stopmon], validation_data=(x2, y2))
    res = model.evaluate(x2, y2, verbose=1)
    print(f'best={stopmon.best:.4f}, loss={res:.4f}')
    

    And the output is:

    Epoch 1/100
    32/32 - 1s - loss: 0.9681 - val_loss: 1.0496 - 626ms/epoch - 20ms/step
    Epoch 2/100
    32/32 - 0s - loss: 0.9515 - val_loss: 0.9901 - 57ms/epoch - 2ms/step
    Epoch 3/100
    32/32 - 0s - loss: 0.9675 - val_loss: 1.0150 - 57ms/epoch - 2ms/step
    Epoch 4/100
    Restoring model weights from the end of the best epoch: 2.
    32/32 - 0s - loss: 0.9596 - val_loss: 1.0154 - 57ms/epoch - 2ms/step
    Epoch 4: early stopping
    2/2 [==============================] - 0s 2ms/step - loss: 0.9901
    best=0.9901, loss=0.9901
    

    You can see that in this case they match. So in conclusion, we can say it works with val_loss and it has a strange loss calculation in the training step.