pythonmachine-learningdeep-learningpytorchautograd

Autograd.grad() for Tensor in pytorch


I want to compute the gradient between two tensors in a net. The input X tensor (batch size x m) is sent through a set of convolutional layers which give me back and output Y tensor(batch size x n).

I’m creating a new loss and I would like to know the gradient of Y w.r.t. X. Something that in tensorflow would be like:

tf.gradients(ys=Y, xs=X)

Unfortunately, I’ve been making tests with torch.autograd.grad(), but I could not figure out how to do it. I get errors like: “RunTimeerror: grad can be implicitly created only for scalar outputs”.

What should be the inputs in torch.autograd.grad() if I want to know the gradient of Y w.r.t. X?


Solution

  • Let's start from simple working example with plain loss function and regular backward. We will build short computational graph and do some grad computations on it.

    Code:

    import torch
    from torch.autograd import grad
    import torch.nn as nn
    
    
    # Create some dummy data.
    x = torch.ones(2, 2, requires_grad=True)
    gt = torch.ones_like(x) * 16 - 0.5  # "ground-truths" 
    
    # We will use MSELoss as an example.
    loss_fn = nn.MSELoss()
    
    # Do some computations.
    v = x + 2
    y = v ** 2
    
    # Compute loss.
    loss = loss_fn(y, gt)
    
    print(f'Loss: {loss}')
    
    # Now compute gradients:
    d_loss_dx = grad(outputs=loss, inputs=x)
    print(f'dloss/dx:\n {d_loss_dx}')
    

    Output:

    Loss: 42.25
    dloss/dx:
    (tensor([[-19.5000, -19.5000], [-19.5000, -19.5000]]),)
    

    Ok, this works! Now let's try to reproduce error "grad can be implicitly created only for scalar outputs". As you can notice, loss in previous example is a scalar. backward() and grad() by defaults deals with single scalar value: loss.backward(torch.tensor(1.)). If you try to pass tensor with more values you will get an error.

    Code:

    v = x + 2
    y = v ** 2
    
    try:
        dy_hat_dx = grad(outputs=y, inputs=x)
    except RuntimeError as err:
        print(err)
    

    Output:

    grad can be implicitly created only for scalar outputs

    Therefore, when using grad() you need to specify grad_outputs parameter as follows:

    Code:

    v = x + 2
    y = v ** 2
    
    dy_dx = grad(outputs=y, inputs=x, grad_outputs=torch.ones_like(y))
    print(f'dy/dx:\n {dy_dx}')
    
    dv_dx = grad(outputs=v, inputs=x, grad_outputs=torch.ones_like(v))
    print(f'dv/dx:\n {dv_dx}')
    

    Output:

    dy/dx:
    (tensor([[6., 6.],[6., 6.]]),)
    
    dv/dx:
    (tensor([[1., 1.], [1., 1.]]),)
    

    NOTE: If you are using backward() instead, simply do y.backward(torch.ones_like(y)).