I am using Keras for the first time on a regression problem. I have set up an early stopping callback, monitoring val_loss (which is mean squared error) with patience=3. However, the training stops even if val_loss is decreasing for the last few epochs. Either there is a bug in my code, or I fail to understand the true meaning of my callback. Can anyone understand what is going on? I provide the training progress and the model building code below.
As you see below, the training stopped at epoch 8, but val_loss has been decreasing since epoch 6 and I think it should have continued running. There was only one time when val_loss increased (from epoch 5 to 6), and patience is 3.
Epoch 1/100
35849/35849 - 73s - loss: 11317667.0000 - val_loss: 7676812.0000
Epoch 2/100
35849/35849 - 71s - loss: 11095449.0000 - val_loss: 7635795.0000
Epoch 3/100
35849/35849 - 71s - loss: 11039211.0000 - val_loss: 7627178.5000
Epoch 4/100
35849/35849 - 71s - loss: 10997918.0000 - val_loss: 7602583.5000
Epoch 5/100
35849/35849 - 65s - loss: 10955304.0000 - val_loss: 7599179.0000
Epoch 6/100
35849/35849 - 59s - loss: 10914252.0000 - val_loss: 7615204.0000
Epoch 7/100
35849/35849 - 59s - loss: 10871920.0000 - val_loss: 7612452.0000
Epoch 8/100
35849/35849 - 59s - loss: 10827388.0000 - val_loss: 7603128.5000
The model is built as follows:
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
from keras import initializers
# create model
model = Sequential()
model.add(Dense(len(predictors), input_dim=len(predictors), activation='relu',name='input',
kernel_initializer=initializers.he_uniform(seed=seed_value)))
model.add(Dense(155, activation='relu',name='hidden1',
kernel_initializer=initializers.he_uniform(seed=seed_value)))
model.add(Dense(1, activation='linear',name='output',
kernel_initializer=initializers.he_uniform(seed=seed_value)))
callback = EarlyStopping(monitor='val_loss', patience=3,restore_best_weights=True)
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
# Fit the model
history = model.fit(X,y, validation_split=0.2, epochs=100,
batch_size=50,verbose=2,callbacks=[callback])
After experimenting with some of the hyperparameters, such as the activation functions, I keep having the same problem. It doesn't always stop at epoch 8, though. I also tried changing patience.
Details: Ubuntu 18.04 Tensorflow 2.6.0 Python 3.8.5
You are misunderstanding how Keras defines improvement. You are correct in that the val_loss
decreased in epochs 7 and 8 and only increased in epoch 6. What you are missing though is that the improvements in 7 and 8 did not improve on the current best value from epoch 5 (7599179.0000). The current best value for loss occurred in epoch 5 and your callback waited 3 epochs to see if anything could beat it, NOT if there would be an improvement from within those 3 epochs. In epoch 8 when the loss did not dip below the 5th epoch the callback terminated the training.