I'm using Google Colab, and saving the weights on my drive.
Training:
def train(model, network_input, network_output):
""" train the neural network """
filepath = "/content/gdrive/MyDrive/weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"
checkpoint = ModelCheckpoint(
filepath,
monitor='loss',
verbose=0,
save_best_only=True,
mode='min'
)
callbacks_list = [checkpoint]
model.fit(network_input, network_output, epochs=200, batch_size=128, callbacks=callbacks_list)
After training for some time, I have the weights: weights in my drive
Then I resume training without modifying my functions, and the output cell looks like this: output cell
How can I know if training resumed from the best weights so far, ie "weights-improvement-06-4.1851-bigger.hdf5", or just restarted from the beginning? If it's training from the saved weights, shouldn't it show that in some way? Perhaps showing me that epochs continue from where it left off starting with Epoch 4/200 instead of 1/200.
If you are still using the same instantiated model object (i.e. you haven't instantiated a new one), it will resume training from where it left off - it won't start over.
However, if you want to instantiate a new model using the same config and start from a previously saved set of weights (checkpoint), you can use tensorflow's latest_checkpoint
to load the most recent checkpoint weights from your directory before passing these weights to the model.
from tensorflow.train import latest_checkpoint
last_ckpt = latest_checkpoint(os.path.join('my','checkpoint','directory'))
# this is the newly instantiated model using the same config
model.load_weights(last_ckpt)