machine-learningdeep-learninghuggingface-transformershuggingface-trainerlearning-rate

How to fix the learning-rate for Huggingface´s Trainer?


I'm training model with the following parameters:

Seq2SeqTrainingArguments(
    output_dir                   = "./out", 
    overwrite_output_dir         = True,
    do_train                     = True,
    do_eval                      = True,
    
    per_device_train_batch_size  = 2, 
    gradient_accumulation_steps  = 4,
    per_device_eval_batch_size   = 8, 
    
    learning_rate                = 1.25e-5,
    warmup_steps                 = 1,
    
    save_total_limit             = 1,
       
    evaluation_strategy          = "epoch",
    save_strategy                = "epoch",
    logging_strategy             = "epoch",  
    num_train_epochs             = 5,   
    
    gradient_checkpointing       = True,
    fp16                         = True,    
        
    predict_with_generate        = True,
    generation_max_length        = 225,
          
    report_to                    = ["tensorboard"],
    load_best_model_at_end       = True,
    metric_for_best_model        = "wer",
    greater_is_better            = False,
    push_to_hub                  = False,
)

I assume that warmup_steps=1 fixes the learning rate. However, after finished training I'm looking on the file trainer_state.json, and it seems that the learning rate is not fixed.

Here are the values of learning_rate and step:

learning_rate, steps

1.0006 e-05       1033
7.5062 e-06       2066
5.0058 e-06       3099
2.5053 e-06       4132
7.2618 e-09       5165

It seems that the learning rate is not fixed on 1.25e-5 (after step 1). What am I missing? How to I fix the learning rate.


Solution

  • A warm-up is in general an increase of the learning rate. It starts at 0 and then increases linearly over 1(here) step to the specified learning rate of 1.25e-5.

    Afterwards by default a linear (in other cases a cosine) learning-rate scheduler decays your learning-rate.

    To disable the decay add lr_scheduler_type='constant'. If I recall correctly, this also disables the warmup.
    If you want warmup and afterwards a constant rate use constant_with_warmup instead.