I'm trying to get some insight into how PyTorch works by implementing Newton's method for solving x = cos(x). Here's a version that works:
x = Variable(DoubleTensor([1]), requires_grad=True)
for i in range(5):
y = x - torch.cos(x)
y.backward()
x = Variable(x.data - y.data/x.grad.data, requires_grad=True)
print(x.data) # tensor([0.7390851332151607], dtype=torch.float64) (correct)
This code seems inelegant (inefficient?) to me since it's recreating the entire computational graph during each step of the for
loop (right?). I tried to avoid this by simply updating the data held by each of the variables instead of recreating them:
x = Variable(DoubleTensor([1]), requires_grad=True)
y = x - torch.cos(x)
y.backward(retain_graph=True)
for i in range(5):
x.data = x.data - y.data/x.grad.data
y.data = x.data - torch.cos(x.data)
y.backward(retain_graph=True)
print(x.data) # tensor([0.7417889255761136], dtype=torch.float64) (wrong)
Seems like, with DoubleTensor
s, I'm carrying enough digits of precision to rule out round-off error. So where's the error coming from?
Possibly related: The above snippet breaks without the retain_graph=True
flag set at every step if the for
loop. The error message I get if I omit it within the loop --- but retain it on line 3 --- is:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time. This seems like evidence that I'm misunderstanding something...
I think your first version of code is optimal, meaning that it is not creating a computation graph on every run.
# initial guess
guess = torch.tensor([1], dtype=torch.float64, requires_grad = True)
# function to optimize
def my_func(x):
return x - torch.cos(x)
def newton(func, guess, runs=5):
for _ in range(runs):
# evaluate our function with current value of `guess`
value = my_func(guess)
value.backward()
# update our `guess` based on the gradient
guess.data -= (value / guess.grad).data
# zero out current gradient to hold new gradients in next iteration
guess.grad.data.zero_()
return guess.data # return our final `guess` after 5 updates
# call starts
result = newton(my_func, guess)
# output of `result`
tensor([0.7391], dtype=torch.float64)
In each run, our function my_func()
, which defines the computation graph, is evaluated with the current guess
value. Once that returns the result, we compute the gradient (with value.backward()
call). With this gradient, we now update our guess
and zero-out our gradients so that it will be afresh for holding the gradients the next time we call value.backward()
(i.e. it stops accumulating the gradient; without zeroing out the gradient, it would by default starts accumulating the gradients. But, we want to avoid that behaviour here).