python-3.xkerascheckpointing

Keras callbacks keep skip saving checkpoints, claiming val_acc is missing


I'll run some larger models and want to try intermediate results.

Therefore, I try to use checkpoints to save the best model after each epoch.

This is my code:

model = Sequential()
model.add(LSTM(700, input_shape=(X_modified.shape[1], X_modified.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(700, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(700))
model.add(Dropout(0.2))
model.add(Dense(Y_modified.shape[1], activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Save the checkpoint in the /output folder
filepath = "output/text-gen-best.hdf5"

# Keep only a single checkpoint, the best over test accuracy.
checkpoint = ModelCheckpoint(filepath,
                            monitor='val_acc',
                            verbose=1,
                            save_best_only=True,
                            mode='max')
model.fit(X_modified, Y_modified, epochs=100, batch_size=50, callbacks=[checkpoint])

But I am still getting the warning after the first epoch:

/usr/local/lib/python3.6/site-packages/keras/callbacks.py:432: RuntimeWarning: Can save best model only with val_acc available, skipping.
  'skipping.' % (self.monitor), RuntimeWarning)

To add metrics=['accuracy'] to the model was in other SO questions (e.g. Unable to save weights while using pre-trained VGG16 model) the solution, but here the error still remains.


Solution

  • You are trying to checkpoint the model using the following code

    # Save the checkpoint in the /output folder
    filepath = "output/text-gen-best.hdf5"
    
    # Keep only a single checkpoint, the best over test accuracy.
    checkpoint = ModelCheckpoint(filepath,
                                monitor='val_acc',
                                verbose=1,
                                save_best_only=True,
                                mode='max')
    

    ModelCheckpoint will consider the argument monitor to take the decision of saving the model or not. In your code it is val_acc. So it will save the weights if there is a increase in the val_acc.

    Now in your fit code,

    model.fit(X_modified, Y_modified, epochs=100, batch_size=50, callbacks=[checkpoint])
    

    you haven't provided any validation data. ModelCheckpoint can't save the weights because it don't have the monitor argument to check.

    In order to do check pointing based on val_acc you must provide some validation data like this.

    model.fit(X_modified, Y_modified, validation_data=(X_valid, y_valid), epochs=100, batch_size=50, callbacks=[checkpoint])
    

    If you don't want to use validation data for whatever be the reason and implement check pointing, you have to change the ModelCheckpoint to work based on acc or loss like this

    # Save the checkpoint in the /output folder
    filepath = "output/text-gen-best.hdf5"
    
    # Keep only a single checkpoint, the best over test accuracy.
    checkpoint = ModelCheckpoint(filepath,
                                monitor='acc',
                                verbose=1,
                                save_best_only=True,
                                mode='max')
    

    Keep in mind that you have to change mode to min if you are going to monitor the loss