[SOLVED] How can i save model while training in torch

How can i save model while training in torch

I am training RoBERTa model for a new language, and it takes some hours to train the data. So I think it is a good idea to save the model while training so that I can continue training the model from where it stops next time.

I am using torch library and google Colab GPU to train the model.

Here is my colab file. https://colab.research.google.com/drive/1jOYCaLdxYRwGMqMciG6c3yPYZAsZRySZ?usp=sharing

Solution

You can use the Trainer from transformers to train the model. This trainer will also need you to specify the TrainingArguments, which will allow you to save checkpoints of the model while training.

Some of the parameters you set when creating TrainingArguments are:

save_strategy: The checkpoint save strategy to adopt during training. Possible values are:
- "no": No save is done during training.
- "epoch": Save is done at the end of each epoch.
- "steps": Save is done every save_steps.
save_steps: Number of updates steps before two checkpoint saves if save_strategy="steps".
save_total_limit: If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in output_dir.
load_best_model_at_end: Whether or not to load the best model found during training at the end of training.

One important thing about load_best_model_at_end is that when set to True, the parameter save_strategy needs to be the same as eval_strategy, and in the case it is “steps”, save_steps must be a round multiple of eval_steps.