I have split the data to train and test (0.85)
and here's my loss visualization code :
import pandas as pd
metrics_df = pd.read_json("./output/metrics.json", orient="records", lines=True)
mdf = metrics_df.sort_values("iteration")
mdf.head(10).T
fig, ax = plt.subplots()
mdf1 = mdf[~mdf["total_loss"].isna()]
ax.plot(mdf1["iteration"], mdf1["total_loss"], c="C0", label="train")
if "validation_loss" in mdf.columns:
mdf2 = mdf[~mdf["validation_loss"].isna()]
ax.plot(mdf2["iteration"], mdf2["validation_loss"], c="C1", label="validation")
# ax.set_ylim([0, 0.5])
ax.legend()
ax.set_title("Loss curve")
plt.show()
How would I include that validation_loss
column in my metrics file?
My outputs are like this :
The simplest way to get the validation loss written into the metrics.json file is to add a hook to the trainer that calculates the loss on the validation set during training.
I have successfully used the LossEvalHook
class from here in my work.
The example code below shows how to use it to create a custom trainer containing a hook for calculating the validation loss every 100 iterations. This code assumes that the validation set is registered and is passed via the cfg.DATASETS.TEST
config parameter.
Also, note that the hook that writes to the metrics.json file is the last element in the list of hooks returned by the DefaultTrainer.build_hooks
method. In order to get the validation loss to also be written into the file, the hook is inserted before the writer hook in the code below.
from detectron2.data import DatasetMapper, build_detection_test_loader
from detectron2.engine import DefaultTrainer
from LossEvalHook import LossEvalHook
class CustomTrainer(DefaultTrainer):
"""
Custom Trainer deriving from the "DefaultTrainer"
Overloads build_hooks to add a hook to calculate loss on the test set during training.
"""
def build_hooks(self):
hooks = super().build_hooks()
hooks.insert(-1, LossEvalHook(
100, # Frequency of calculation - every 100 iterations here
self.model,
build_detection_test_loader(
self.cfg,
self.cfg.DATASETS.TEST[0],
DatasetMapper(self.cfg, True)
)
))
return hooks
The custom trainer can then be used for training instead of the DefaultTrainer
.
trainer = CustomTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()