I'm fine-tuning a transformer model for text classification in Pytorch using huggingface Trainer. I would like to log both the training and the validation loss for each epoch of training. This is so that I can assess when the model starts to overfit to the training data (i.e. the point at which training loss keeps decreasing, but validation loss is stable or increasing, the bias-variance tradeoff).
Here are my training arguments of the huggingface trainer:
training_arguments = TrainingArguments(
output_dir = os.path.join(MODEL_DIR, f'{TODAYS_DATE}_multicls_cls'),
run_name = f'{TODAYS_DATE}_multicls_cls',
overwrite_output_dir=True,
evaluation_strategy='epoch',
save_strategy='epoch',
num_train_epochs=7.0,
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
optim='adamw_torch',
learning_rate=LEARNING_RATE
My training arguments of the huggingface trainer are set to evaluate every epoch, as desired, but my training loss is computed every 500 steps by default. You can see this in the log history of trainer.state
after training:
{'eval_loss': 6.346338748931885, 'eval_f1': 0.2146690518783542, 'eval_runtime': 1.2777, 'eval_samples_per_second': 31.306, 'eval_steps_per_second': 31.306, 'epoch': 1.0, 'step': 160}
{'eval_loss': 5.505970001220703, 'eval_f1': 0.23817863397548159, 'eval_runtime': 1.5768, 'eval_samples_per_second': 25.367, 'eval_steps_per_second': 25.367, 'epoch': 2.0, 'step': 320}
{'eval_loss': 5.21959114074707, 'eval_f1': 0.2233676975945017, 'eval_runtime': 1.3016, 'eval_samples_per_second': 30.732, 'eval_steps_per_second': 30.732, 'epoch': 3.0, 'step': 480}
{'loss': 6.1108, 'learning_rate': 2.767857142857143e-05, 'epoch': 3.12, 'step': 500}
{'eval_loss': 5.014569282531738, 'eval_f1': 0.24625623960066553, 'eval_runtime': 1.3961, 'eval_samples_per_second': 28.652, 'eval_steps_per_second': 28.652, 'epoch': 4.0, 'step': 640}
{'eval_loss': 5.090881824493408, 'eval_f1': 0.2212643678160919, 'eval_runtime': 1.2708, 'eval_samples_per_second': 31.477, 'eval_steps_per_second': 31.477, 'epoch': 5.0, 'step': 800}
{'eval_loss': 4.950728416442871, 'eval_f1': 0.23750000000000002, 'eval_runtime': 1.298, 'eval_samples_per_second': 30.816, 'eval_steps_per_second': 30.816, 'epoch': 6.0, 'step': 960}
{'loss': 3.8989, 'learning_rate': 5.357142857142857e-06, 'epoch': 6.25, 'step': 1000}
{'eval_loss': 4.940125465393066, 'eval_f1': 0.24444444444444444, 'eval_runtime': 1.4609, 'eval_samples_per_second': 27.38, 'eval_steps_per_second': 27.38, 'epoch': 7.0, 'step': 1120}
{'train_runtime': 80.7323, 'train_samples_per_second': 13.873, 'train_steps_per_second': 13.873, 'total_flos': 73700199874560.0, 'train_loss': 4.81386468069894, 'epoch': 7.0, 'step': 1120}
How can I set the training arguments to log the training loss every epoch, just like my validation loss? There is no equivalent parameter to evaluation_strategy=epoch
for training in training arguments.
To log training loss every epoch, set logging_strategy='epoch'
.
Now I get:
{'loss': 7.1773, 'learning_rate': 4.2857142857142856e-05, 'epoch': 1.0, 'step': 160}
{'eval_loss': 6.232218265533447, 'eval_f1': 0.20766773162939295, 'eval_runtime': 1.2916, 'eval_samples_per_second': 30.97, 'eval_steps_per_second': 30.97, 'epoch': 1.0, 'step': 160}
{'loss': 6.3841, 'learning_rate': 3.571428571428572e-05, 'epoch': 2.0, 'step': 320}
{'eval_loss': 5.86290979385376, 'eval_f1': 0.2006269592476489, 'eval_runtime': 1.3634, 'eval_samples_per_second': 29.339, 'eval_steps_per_second': 29.339, 'epoch': 2.0, 'step': 320}
{'loss': 5.5212, 'learning_rate': 2.857142857142857e-05, 'epoch': 3.0, 'step': 480}
{'eval_loss': 5.343527793884277, 'eval_f1': 0.24319419237749546, 'eval_runtime': 1.29, 'eval_samples_per_second': 31.008, 'eval_steps_per_second': 31.008, 'epoch': 3.0, 'step': 480}
{'loss': 4.7184, 'learning_rate': 2.1428571428571428e-05, 'epoch': 4.0, 'step': 640}
{'eval_loss': 5.131855487823486, 'eval_f1': 0.23588039867109634, 'eval_runtime': 1.3336, 'eval_samples_per_second': 29.993, 'eval_steps_per_second': 29.993, 'epoch': 4.0, 'step': 640}
{'loss': 4.0205, 'learning_rate': 1.4285714285714285e-05, 'epoch': 5.0, 'step': 800}
{'eval_loss': 4.972315788269043, 'eval_f1': 0.22551928783382788, 'eval_runtime': 1.2714, 'eval_samples_per_second': 31.462, 'eval_steps_per_second': 31.462, 'epoch': 5.0, 'step': 800}
{'loss': 3.5411, 'learning_rate': 7.142857142857143e-06, 'epoch': 6.0, 'step': 960}
{'eval_loss': 4.964015960693359, 'eval_f1': 0.23100303951367776, 'eval_runtime': 1.2783, 'eval_samples_per_second': 31.292, 'eval_steps_per_second': 31.292, 'epoch': 6.0, 'step': 960}
{'loss': 3.2564, 'learning_rate': 0.0, 'epoch': 7.0, 'step': 1120}
{'eval_loss': 4.895078182220459, 'eval_f1': 0.22585438335809802, 'eval_runtime': 1.3362, 'eval_samples_per_second': 29.935, 'eval_steps_per_second': 29.935, 'epoch': 7.0, 'step': 1120}
{'train_runtime': 81.2849, 'train_samples_per_second': 13.779, 'train_steps_per_second': 13.779, 'total_flos': 73700199874560.0, 'train_loss': 4.945595060076032, 'epoch': 7.0, 'step': 1120}]