pythonpytorchmlp

Input Derivative of a NN in the Loss function in PyTorch


I try to approximate a nonlinear function $V(x):\mathbb{R}^n\to \mathbb{R}_+$ with an MLP in PyTorch, e.g. V_x = model(x).

There are only $N$ samples of $\nabla V^T(x) = \frac{\partial V(x)}{\partial x}$ available. Thus, I have a matrix S of dimension $N\times n$ which contain all the samples.

The loss should be the mean squared error between S and $\frac{\partial}{\partial x}$ V_x.

My problem is, that I don't know how calculate $\frac{\partial}{\partial x}$ V_x in PyTorch such that it does not lose the dependency on the weights of the network.

I added a minimal example, which shows that the loss is not decreasing, because of the missing dependency.

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim


model = nn.Sequential(
    nn.Linear(2, 10),
    nn.ReLU(),
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 1)
)

model.float()
loss_fn = nn.MSELoss()  
optimizer = optim.Adam(model.parameters(), lr=0.1)

# Generate Samples 
# V(x) = x^T P x
# grad V(x) = 2Px
P = np.matrix([[20.1892, -26.6218],[-26.6218, 38.0375]])
N_S = 10
N = N_S**2 # amount of samples
x_1 = np.linspace(-3,3,N_S)
x_2 = np.linspace(-3,3,N_S)
x = np.array([(a,b) for a in x_1 for b in x_2])
S = np.zeros((N,2))
for i in range(N):
    S[i,:]=2*P@x[i,:]
        

# training
epoch = 1

while epoch<1000:
    S_tensor = torch.from_numpy(S).float()
    x_tensor = torch.from_numpy(x).float()

    grad_V_x = torch.autograd.functional.jacobian(model,x_tensor)
    grad_V_x.requires_grad_()

    loss = loss_fn(grad_V_x,S_tensor)

    optimizer.zero_grad() # reset gradients
    loss.backward() # calculate gradient 
    optimizer.step() # update weights

    print(f"epoch {epoch} loss {loss}")
    epoch = epoch+1 

Any help is appreciated!


Solution

  • Replace

    while epoch<1000:
        S_tensor = torch.from_numpy(S).float()
        x_tensor = torch.from_numpy(x).float()
    
        grad_V_x = torch.autograd.functional.jacobian(model,x_tensor)
        grad_V_x.requires_grad_()
    
        loss = loss_fn(grad_V_x,S_tensor)
    
        optimizer.zero_grad() # reset gradients
        loss.backward() # calculate gradient 
        optimizer.step() # update weights
    
        print(f"epoch {epoch} loss {loss}")
        epoch = epoch+1 
    

    with

    while epoch<1000:
        S_tensor = torch.from_numpy(S).float()
        x_tensor = torch.from_numpy(x).float()
        
        x_tensor.requires_grad = True
        # Calculate the gradient
        V_x = model(x_tensor)
        grad_V_x = torch.autograd.grad(outputs=V_x, inputs=x_tensor, grad_outputs=torch.ones_like(V_x), create_graph=True)
        loss = loss_fn(grad_V_x[0], S_tensor)
    
        optimizer.zero_grad()  # reset gradients
        loss.backward()  # calculate gradient
        optimizer.step()  # update weights
    
        print(f"epoch {epoch} loss {loss}")
        epoch = epoch+1 
    

    The loss is decreasing.

    enter image description here

    For more information about torch.autograd.grad, you can refer: https://pytorch.org/docs/stable/generated/torch.autograd.grad.html