pythontensorflowkerassupervised-learningadam

Why does my learning rate decrease, even when loss is improving?


I am training my Keras model on Google Colab TPU as follows -

adam = Adam(lr=0.002)
model.compile(loss='mse', metrics=[PSNRLoss, SSIMLoss], optimizer=adam)  

checkpoint = ModelCheckpoint("model_{epoch:02d}.hdf5", monitor='loss', verbose=1, save_best_only=True,
                              mode='min')
reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.5,
                          patience=5, min_lr=0.00002)
csv_logger = CSVLogger('history.log')

callbacks_list = [checkpoint,reduce_lr,csv_logger]

model.fit(traindb, batch_size=1024,
            callbacks=callbacks_list,shuffle=True,epochs=100, verbose=2, validation_data = validdb)

During training, my learning rate is decreased by a factor of 0.5 even when the loss is improving with the current value of learning rate. As you can see in the snippet below learning rate decreases from 0.0020 to 0.0010 to 0.0005.

Epoch 00011: loss improved from 0.00647 to 0.00646, saving model to ./models_x4/no_noise/dcscn_x2_11.hdf5
1939/1939 - 109s - PSNRLoss: 23.7280 - loss: 0.0065 - SSIMLoss: 0.3329 - val_PSNRLoss: 23.9022 - val_loss: 0.0066 - val_SSIMLoss: 0.3815 - lr: 0.0020
Epoch 12/100

Epoch 00012: loss improved from 0.00646 to 0.00645, saving model to ./models_x4/no_noise/dcscn_x2_12.hdf5
1939/1939 - 111s - PSNRLoss: 23.7245 - loss: 0.0065 - SSIMLoss: 0.3331 - val_PSNRLoss: 23.9397 - val_loss: 0.0066 - val_SSIMLoss: 0.3705 - lr: 0.0020
Epoch 13/100

Epoch 00013: loss improved from 0.00645 to 0.00644, saving model to ./models_x4/no_noise/dcscn_x2_13.hdf5
1939/1939 - 110s - PSNRLoss: 23.7300 - loss: 0.0064 - SSIMLoss: 0.3321 - val_PSNRLoss: 23.9827 - val_loss: 0.0065 - val_SSIMLoss: 0.3745 - lr: 0.0020
Epoch 14/100

Epoch 00014: loss improved from 0.00644 to 0.00643, saving model to ./models_x4/no_noise/dcscn_x2_14.hdf5
1939/1939 - 111s - PSNRLoss: 23.7279 - loss: 0.0064 - SSIMLoss: 0.3376 - val_PSNRLoss: 23.9079 - val_loss: 0.0066 - val_SSIMLoss: 0.3959 - lr: 0.0020
Epoch 15/100

Epoch 00015: loss improved from 0.00643 to 0.00634, saving model to ./models_x4/no_noise/dcscn_x2_15.hdf5
1939/1939 - 110s - PSNRLoss: 23.8356 - loss: 0.0063 - SSIMLoss: 0.3408 - val_PSNRLoss: 23.7063 - val_loss: 0.0067 - val_SSIMLoss: 0.3799 - lr: 0.0010
Epoch 16/100

Epoch 00016: loss did not improve from 0.00634
1939/1939 - 107s - PSNRLoss: 23.8173 - loss: 0.0063 - SSIMLoss: 0.3398 - val_PSNRLoss: 23.7282 - val_loss: 0.0067 - val_SSIMLoss: 0.3853 - lr: 0.0010
Epoch 17/100

Epoch 00017: loss did not improve from 0.00634
1939/1939 - 110s - PSNRLoss: 23.8199 - loss: 0.0063 - SSIMLoss: 0.3426 - val_PSNRLoss: 23.7202 - val_loss: 0.0067 - val_SSIMLoss: 0.4082 - lr: 0.0010
Epoch 18/100

Epoch 00018: loss did not improve from 0.00634
1939/1939 - 110s - PSNRLoss: 23.8138 - loss: 0.0063 - SSIMLoss: 0.3393 - val_PSNRLoss: 23.7523 - val_loss: 0.0066 - val_SSIMLoss: 0.4037 - lr: 0.0010
Epoch 19/100

Epoch 00019: loss improved from 0.00634 to 0.00634, saving model to ./models_x4/no_noise/dcscn_x2_19.hdf5
1939/1939 - 110s - PSNRLoss: 23.8189 - loss: 0.0063 - SSIMLoss: 0.3406 - val_PSNRLoss: 23.7188 - val_loss: 0.0067 - val_SSIMLoss: 0.4115 - lr: 0.0010
Epoch 20/100

Epoch 00020: loss improved from 0.00634 to 0.00634, saving model to ./models_x4/no_noise/dcscn_x2_20.hdf5
1939/1939 - 108s - PSNRLoss: 23.8176 - loss: 0.0063 - SSIMLoss: 0.3407 - val_PSNRLoss: 23.7692 - val_loss: 0.0066 - val_SSIMLoss: 0.3883 - lr: 0.0010
Epoch 21/100

Epoch 00021: loss improved from 0.00634 to 0.00627, saving model to ./models_x4/no_noise/dcscn_x2_21.hdf5
1939/1939 - 108s - PSNRLoss: 23.8889 - loss: 0.0063 - SSIMLoss: 0.3478 - val_PSNRLoss: 24.0306 - val_loss: 0.0064 - val_SSIMLoss: 0.3544 - lr: 5.0000e-04
Epoch 22/100

Epoch 00022: loss improved from 0.00627 to 0.00627, saving model to ./models_x4/no_noise/dcscn_x2_22.hdf5
1939/1939 - 109s - PSNRLoss: 23.8847 - loss: 0.0063 - SSIMLoss: 0.3466 - val_PSNRLoss: 24.0461 - val_loss: 0.0064 - val_SSIMLoss: 0.3679 - lr: 5.0000e-04

Thanking you in anticipation :) Please suggest where am I going wrong ? Should I monitor some other appropriate value.


Solution

  • ReduceLROnPlateau object has an argument called min_delta which is a threshold for measuring the new optimum. The default value of min_delta is 0.0001. So, although your log output says that loss improved, this improvement is avoided if it is less than min_delta. Therefore, after patience epochs, the learning rate is decreased.