I'm trying to train a mobileNetV3Large with a simple PyTorch Scheduler. This is the portion of the code responsible for training:
bench_val_loss = 1000
bench_acc = 0.0
epochs = 15
optimizer = optim.Adam(embeddingNet.parameters(), lr=1e-3)
loss_optimizer = torch.optim.Adam(loss_fn.parameters(), lr=1e-3)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=3, threshold=0.02)
for epoch in range(1, epochs + 1):
print(f'current lr: {scheduler.get_last_lr()}')
loss=train(embeddingNet, loss_fn, device, train_dataloader, optimizer, loss_optimizer, epoch)
val_loss, accuracy =test(train_dataset, val_dataset, embeddingNet, accuracy_calculator, loss_fn, epoch, val_dataloader)
#val_loss = simpleTest(train_dataset, val_dataset, embeddingNet, accuracy_calculator, loss_fn, epoch, val_dataloader)
torch.save(embeddingNet.state_dict(), 'my/path/mobileNetV3L_ArcFaceLAST.pth')
if accuracy >= bench_acc:
bench_val_loss = val_loss
torch.save(embeddingNet.state_dict(), 'my/path/mobileNetV3L_ArcFaceBEST.pth')
writer.add_scalars('Training vs. Validation Loss',
{'Training': loss, 'Validation': val_loss},
And here you'll find the first 7 training logs
Test set accuracy (Precision@1) = 0.17834772304046048
current lr: [0.001]
Epoch 3: Loss = 39.68284225463867
Epoch 3: valLoss = 39.9765007019043
100%|██████████| 962/962 [01:43<00:00, 9.28it/s]
100%|██████████| 370/370 [00:41<00:00, 8.92it/s]
Computing accuracy
Test set accuracy (Precision@1) = 0.31242593533096324
current lr: [0.001]
Epoch 4: Loss = 39.4412841796875
Epoch 4: valLoss = 39.67761562450512
100%|██████████| 962/962 [01:45<00:00, 9.11it/s]
100%|██████████| 370/370 [00:41<00:00, 8.86it/s]
Computing accuracy
Test set accuracy (Precision@1) = 0.3633824276282377
current lr: [0.001]
Epoch 5: Loss = 39.09823989868164
Epoch 5: valLoss = 39.54649614901156
100%|██████████| 962/962 [01:42<00:00, 9.37it/s]
100%|██████████| 370/370 [00:41<00:00, 8.87it/s]
Computing accuracy
Test set accuracy (Precision@1) = 0.44244117149145085
current lr: [0.001]
Epoch 6: Loss = 38.70449447631836
Epoch 6: valLoss = 39.1865906792718
100%|██████████| 962/962 [01:45<00:00, 9.15it/s]
100%|██████████| 370/370 [00:39<00:00, 9.25it/s]
Computing accuracy
Test set accuracy (Precision@1) = 0.5167597765363129
current lr: [0.0001]
I can't figure out why the scheduler decided to reduce the learning rate even thought the accuracy was increasing more quickly than the threshold.
Where is the error?
When you use ReduceLROnPlateau with mode='min', the learning rate will be reduced when the monitored quantity does not decrease. Since you monitor accuracy, which you want to increase, you should use mode='max'.