This is a simple code to show the problem
import torch
X = torch.arange(-3, 3, step=0.1)
Y = X * 3
Y += 0.1 * torch.randn(Y.shape)
def my_train_model(iter):
w = torch.tensor(-15.0, requires_grad=True)
lr = 0.1
for epoch in range(iter):
print(w.grad)
yhat = w * X
loss = torch.mean((yhat - Y) ** 2)
loss.backward()
with torch.no_grad():
print(w.grad)
w = w - lr * w.grad # gradient exists if w-= lr*w.grad
print(w.grad)
w.grad.zero_()
print(loss)
my_train_model(4)
This sets the w.grad to none
after performing the w = w - lr * w.grad
, but the problem will be solved if I use w -= lr * w.grad
instead of the above expression!
What is the problem with the first expression which sets w.grad
to none
?
The function torch.no_grad()
guarantees that no gradient is computed, which means any component wrapped in there is created with requires_grad=False
, as you can see in this example.
The in-place operation only changes the value of the tensor, from this answer from forum:
An in-place operation is an operation that changes directly the content of a given Tensor without making a copy.
Therefore, with the problem that you proposed, w = w - lr * w.grad
will replace w
by the copy of itself without gradient, and w -= lr * w.grad
only change the content of tensor which is still keep the gradient of itself from the operation outside of torch.no_grad()
.