[SOLVED] Can we perform operations at every epoch during YOLOv8 training

Can we perform operations at every epoch during YOLOv8 training

I want to do some calculations at every epoch during training, but Ultralytics YOLOv8 was not allowing to do that meaning training performed for all epochs at once.

I want to compare the validation loss at every epoch with the previous iterations for early stopping. The Ultralytics has early stopping with 'patience' parameter but it is using mAP as the metric for comparison but not any type of loss (seems complicated to make any changes to the existing code). So I want to do the validation loss comparison at every epoch to avoid over fitting (here I have the validation loss for every epoch but I am unable to control the early stopping how can I do this.)

Solution

Use callbacks to perform operations at every epoch during YOLOv8 training.

Ultralytics callbacks are specialized entry points triggered during key stages of model operations like training, validation, exporting, and prediction. These callbacks allow for custom functionality at specific points in the process, enabling enhancements and modifications to the workflow. Each callback accepts a Trainer, Validator, or Predictor object, depending on the operation type.

In this case, you need to define the callback function 'on_fit_epoch_end', which is called at the end of each fit epoch (train + val). Use the training stopping flag trainer.stop = True to stop the training if the relevant condition is reached. Here is a simple example of the task you have described:

import pandas as pd
from ultralytics import YOLO

# callback method 'on_fit_epoch_end' is called at the end of each fit epoch (train + val)
def on_fit_epoch_end(trainer):
    # get the results.csv data
    results = pd.read_csv(trainer.csv)
    # get the current epoch number from the trainer
    current_epoch = trainer.epoch
    # get the current validation box loss from the results.csv
    current_loss = results['           val/box_loss'].iloc[current_epoch]
    # get the previous validation box loss from the results.csv
    previous_loss = results['           val/box_loss'].iloc[current_epoch - 1 if current_epoch > 0 else current_epoch]

    print(current_loss, previous_loss)

    # early stop logic
    if current_loss > previous_loss:
        print('stopping')
        # trainer stop flag
        trainer.stop = True

model = YOLO("yolov8n.pt")

# register the custom callback
model.add_callback("on_fit_epoch_end", on_fit_epoch_end)

# train as usual
model.train(data='dataset.yaml', epochs=100)

I have not found out how to get the loss values directly from a trainer or validator objects: trainer.loss and trainer.validator.loss return tensors with raw values: one value for train.loss and 3 values for trainer.validator.loss. I don't know how to process these values, so I used the results.csv file where these data are logged in a more traditional form.