I have the following training code. I am quite sure I call loss.backward()
just once, and yet I am getting the error from the title. What am I doing wrong? Note that the X_train_tensor
is output from another graph calculation, so it has required_grad=True
as you can see in the print statement. Is this the source of the problem, and if so, how can I change it? It won't allow me to toggle it directly on the tensor.
for iter in range(max_iters):
start_ix = 0
loss = None
while start_ix < len(X_train_tensor):
loss = None
end_ix = min(start_ix + batch_size, len(X_train_tensor))
out, loss, accuracy = model(X_train_tensor[start_ix:end_ix], y_train_tensor[start_ix:end_ix])
# every once in a while evaluate the loss on train and val sets
if (start_ix==0) and (iter % 10 == 0 or iter == max_iters - 1):
out_val, loss_val, accuracy_val = model(X_val_tensor, y_val_tensor)
print(f"step {iter}: train loss={loss:.2f} train_acc={accuracy:.3f} | val loss={loss_val:.2f} val_acc={accuracy_val:.3f} {datetime.datetime.now()}")
optimizer.zero_grad(set_to_none=True)
print (iter, start_ix, X_train_tensor.requires_grad, y_train_tensor.requires_grad, loss.requires_grad)
loss.backward()
optimizer.step()
start_ix = end_ix + 1
This is the error:
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
Update: this is where the model input tensors are coming from, as output of other (autoencoder) model:
autoencoder.eval()
with torch.no_grad(): # it seems like adding this line solves the problem?
X_train_encoded, loss = autoencoder(X_train_tensor)
X_val_encoded, loss = autoencoder(X_val_tensor)
X_test_encoded, loss = autoencoder(X_test_tensor)
Adding the with torch.no_grad()
line above solves the issue, but I don't understand why. Does it actually change how the outputs are generated, how does that work?
From what I understand, the X_train_tensor
is output from the autoencoder. When you do not run torch.no_grad()
during the encoding step, a computational graph is created for the outputs of the autoencoder, which links the autoencoder's operations and weights to the encoded tensors. In your code, since the model's output uses the X_train_tensor
, the model's loss is connected to the autoencoder's computational graph.
When you call loss.backward()
the first time, PyTorch traverses the entire computational graph, including the autoencoder, to compute gradients and then clears the graph. When you call loss.backward()
in the second iteration of the loop, you are attempting to traverse the cleared autoencoder's computational graph.
torch.no_grad()
prevents PyTorch from creating the autoencoder computational graph or linking the resulting loss to the autoencoder.