I found that restore_best_weights=True
does not actually restore the best behavior. A simplified example with some dummy data:
import numpy as np
from tensorflow.keras.utils import set_random_seed
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.callbacks import EarlyStopping
np.random.seed(1)
set_random_seed(2)
x = np.array([1., 2., 3., 4., 5.])
y = np.array([1., 3., 4., 2., 5.])
model = Sequential()
model.add(Dense(2, input_shape=(1, ), activation='tanh'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1))
model.compile(optimizer=RMSprop(learning_rate=0.1), loss='mse')
stopmon = EarlyStopping(monitor='loss', patience=2, restore_best_weights=True, verbose=1)
history = model.fit(x, y, epochs=100, verbose=2, callbacks=[stopmon])
res = model.evaluate(x, y, verbose=1)
print(f'best={stopmon.best:.4f}, loss={res:.4f}')
The output (on my system) is:
Epoch 1/100
1/1 - 0s - loss: 11.8290 - 434ms/epoch - 434ms/step
Epoch 2/100
1/1 - 0s - loss: 1.9091 - 0s/epoch - 0s/step
Epoch 3/100
1/1 - 0s - loss: 1.5159 - 16ms/epoch - 16ms/step
Epoch 4/100
1/1 - 0s - loss: 1.3921 - 0s/epoch - 0s/step
Epoch 5/100
1/1 - 0s - loss: 1.6787 - 0s/epoch - 0s/step
Epoch 6/100
Restoring model weights from the end of the best epoch: 4.
1/1 - 0s - loss: 2.0629 - 33ms/epoch - 33ms/step
Epoch 6: early stopping
1/1 [==============================] - 0s 100ms/step - loss: 1.6787
best=1.3921, loss=1.6787
It looks like the weights are set to those from epoch 4. Then why does the loss still evaluate to the higher value from epoch 6? Is there anything extra I should do to update the model or something?
I use an up-to-date TensorFlow (version 2.12.0) on Windows x64 (Intel), tf.version.COMPILER_VERSION == 'MSVC 192930140'
.
I think it has something to do with the loss calculation for the training loss.
But it still works, at least for the val_loss.
I have done 2 tests
1_Without validation:
np.random.seed(1)
set_random_seed(2)
x = np.random.randn(1000)
y = np.random.randn(1000)
model = Sequential()
model.add(Dense(2, input_shape=(1,), activation='tanh'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1))
model.compile(optimizer=RMSprop(learning_rate=0.1), loss='mse')
stopmon = EarlyStopping(monitor='loss', patience=2, restore_best_weights=True, verbose=1)
history = model.fit(x, y, epochs=100, verbose=2, callbacks=[stopmon])
res = model.evaluate(x, y, verbose=1)
print(f'best={stopmon.best:.4f}, loss={res:.4f}')
The output is:
Epoch 1/100
32/32 - 0s - loss: 0.9681 - 468ms/epoch - 15ms/step
Epoch 2/100
32/32 - 0s - loss: 0.9515 - 33ms/epoch - 1ms/step
Epoch 3/100
32/32 - 0s - loss: 0.9675 - 30ms/epoch - 953us/step
Epoch 4/100
Restoring model weights from the end of the best epoch: 2.
32/32 - 0s - loss: 0.9596 - 37ms/epoch - 1ms/step
Epoch 4: early stopping
32/32 [==============================] - 0s 952us/step - loss: 1.0256
best=0.9515, loss=1.0256
You can see, its very strange that the loss is higher than the rest. Maybe due how the calculate the loss in the training step.
2_With validation step.
np.random.seed(1)
set_random_seed(2)
x = np.random.randn(1000)
y = np.random.randn(1000)
x2 = np.random.randn(50)
y2 = np.random.randn(50)
model = Sequential()
model.add(Dense(2, input_shape=(1,), activation='tanh'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1))
model.compile(optimizer=RMSprop(learning_rate=0.1), loss='mse')
stopmon = EarlyStopping(monitor='val_loss', patience=2, restore_best_weights=True, verbose=1)
history = model.fit(x, y, epochs=100, verbose=2, callbacks=[stopmon], validation_data=(x2, y2))
res = model.evaluate(x2, y2, verbose=1)
print(f'best={stopmon.best:.4f}, loss={res:.4f}')
And the output is:
Epoch 1/100
32/32 - 1s - loss: 0.9681 - val_loss: 1.0496 - 626ms/epoch - 20ms/step
Epoch 2/100
32/32 - 0s - loss: 0.9515 - val_loss: 0.9901 - 57ms/epoch - 2ms/step
Epoch 3/100
32/32 - 0s - loss: 0.9675 - val_loss: 1.0150 - 57ms/epoch - 2ms/step
Epoch 4/100
Restoring model weights from the end of the best epoch: 2.
32/32 - 0s - loss: 0.9596 - val_loss: 1.0154 - 57ms/epoch - 2ms/step
Epoch 4: early stopping
2/2 [==============================] - 0s 2ms/step - loss: 0.9901
best=0.9901, loss=0.9901
You can see that in this case they match. So in conclusion, we can say it works with val_loss and it has a strange loss calculation in the training step.