I'm new to PyTorch
and am working on a toy example to understand how weight decay works in learning rate passed into the optimizer. When I use MultiStepLR
, I was expecting to decrease the learning rate in given epoch numbers, however, it does not work as I intended. What am I doing wrong?
import random
import torch
import pandas as pd
import numpy as np
from torch import nn
from torch.utils.data import Dataset,DataLoader,TensorDataset
from torchvision import datasets, transforms
model = nn.Sequential(nn.Linear(n_input, n_hidden),
nn.ReLU(),
nn.Linear(n_hidden, n_out),
nn.ReLU())
loss_function = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[2,4], gamma=0.1)
for e in range(5):
scheduler.step()
print(e, ' : lr', scheduler.get_lr()[0],"\n")
0 : lr 0.1
1 : lr 0.0010000000000000002
2 : lr 0.010000000000000002
3 : lr 0.00010000000000000003
4 : lr 0.0010000000000000002
The expected behavior in learning rate is [0.1, 0.1, 0.01, 0.01, 0.001]
When running your code I get the following warning:
/home/user/anaconda3/envs/eai/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:138: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`.
In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.
Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.
See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
/home/user/anaconda3/envs/eai/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:429: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
warnings.warn("To get the last learning rate computed by the scheduler, "
Your code can be fixed by following the warning message and using get_last_lr
:
import random
import torch
import pandas as pd
import numpy as np
from torch import nn
from torch.utils.data import Dataset,DataLoader,TensorDataset
from torchvision import datasets, transforms
model = nn.Sequential(nn.Linear(4, 4),
nn.ReLU(),
nn.Linear(4, 4),
nn.ReLU())
loss_function = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[2,4], gamma=0.1)
for e in range(5):
scheduler.step()
print(e, ' : lr', scheduler.get_last_lr(),"\n")
With output:
0 : lr [0.1]
1 : lr [0.010000000000000002]
2 : lr [0.010000000000000002]
3 : lr [0.0010000000000000002]
4 : lr [0.0010000000000000002]
If you want the learning rate to decrease each epoch instead you should remove the milestones
parameter.