I'm training model with the following parameters:
Seq2SeqTrainingArguments(
output_dir = "./out",
overwrite_output_dir = True,
do_train = True,
do_eval = True,
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
per_device_eval_batch_size = 8,
learning_rate = 1.25e-5,
warmup_steps = 1,
save_total_limit = 1,
evaluation_strategy = "epoch",
save_strategy = "epoch",
logging_strategy = "epoch",
num_train_epochs = 5,
gradient_checkpointing = True,
fp16 = True,
predict_with_generate = True,
generation_max_length = 225,
report_to = ["tensorboard"],
load_best_model_at_end = True,
metric_for_best_model = "wer",
greater_is_better = False,
push_to_hub = False,
)
I assume that warmup_steps=1
fixes the learning rate.
However, after finished training I'm looking on the file trainer_state.json
, and it seems that the learning rate is not fixed.
Here are the values of learning_rate and step:
learning_rate, steps
1.0006 e-05 1033
7.5062 e-06 2066
5.0058 e-06 3099
2.5053 e-06 4132
7.2618 e-09 5165
It seems that the learning rate is not fixed on 1.25e-5 (after step 1). What am I missing? How to I fix the learning rate.
A warm-up is in general an increase of the learning rate. It starts at 0 and then increases linearly over 1(here) step to the specified learning rate of 1.25e-5
.
Afterwards by default a linear (in other cases a cosine) learning-rate scheduler decays your learning-rate.
To disable the decay add lr_scheduler_type='constant'
.
If I recall correctly, this also disables the warmup.
If you want warmup and afterwards a constant rate use constant_with_warmup
instead.