pythonrgradient-descentobjective-function

Implementing gradient descent on with known objective function


I have an objective function from a paper that I would like to minimize with gradient descent. I have not yet had to do this "from scratch" and would like some advice as to how to code it up manually. The objective function is:

T(L) = tr(X.T L^s X) - beta * ||L||.

where L is an N x N matrix positive semidefinite matrix to be estimated, X is an N x M matrix, beta is a regularization constant, X.T = X transpose, and ||.|| is the frobenius norm.

Also, L^s is the matrix exponential where L^s = F Λ^s F.T, where F is a matrix of the eigenvectors of L and Λ is the diagonal matrix of eigenvalues of L.

The derivative of the objective function is:

dT/dL = sum_{from r = 0 to r = s - 1} L^r (XX.T) L^(s-r-1) - 2 * beta * L

I have done very rudimentary gradient descent problems (such as matrix factorization) where optimization is done over every element of the matrix, or using packages/libraries. This kind of problem is more complex I am used to, and I was hoping that some of you that are much more experienced with this sort of thing could help me out.

Any general advice is much appreciated as well as specific recommendations of how to code this up in python or R.

Here is the link for the paper with this function: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0128136#sec016

Thank you very much for your help!

Paul


Solution

  • In general, it would probably be advisable to use a machine learning library such as tensorflow or pytorch. If you go down this route you have several advantages 1) efficient C++ implementation of the Tensor operations 2) automatic differentiation 3) easy access to more sophisticated optimizers (e.g. ADAM). ` If you prefer to do the gradient computation yourself you could do that by setting the gradient L.grad manually before the optimization step

    A simple implementation would look like this:

    import torch
    
    n=10
    m=20
    s = 3
    b=1e-3
    n_it=40
    
    # L=torch.nn.Parameter(torch.rand(n,n))
    F=torch.nn.Parameter(torch.rand(n,n))
    D=torch.nn.Parameter(torch.rand(n))
    X=torch.rand((n,m))
    opt=torch.optim.SGD([F,D],lr=1e-4)
    
    
    for i in range(n_it):
        loss = (X.T.matmul(F.matmul((D**s).unsqueeze(1)*F.T)).matmul(X)).trace() - b * F.matmul((D**s).unsqueeze(1)*F.T).norm(2)
        print(loss)
        opt.zero_grad()
        loss.backward()
        opt.step()