algorithmmatlabcomputational-geometrydifference

Difference between algorithm solution and MATLAB CVX solution in Graphical LASSO?


Graphical Least Absolute Shrinkage and Selection Operator, has been introduced by Jerome Friedman, Trevor Hastie and Robert Tibshirani ("Sparse inverse covariance estimation with the graphical lasso",2014). They suggest the block coordinate-descent algorithm for the problem solution (see. "The Graphical Lasso: New Insights and Alternatives" by Rahul Mazumder and Trevor Hastie). I wrote this easy MATLAB code using CVX, given X (regressor's matrix of size m,n):

S = cov(X,0);

cvx_begin
    variable theta(n,n) semidefinite
    minimize (trace(S*theta)-log_det(theta)+lambda*norm(theta,1))
cvx_end

What's the difference between the block coordinate-descent algorithm solution and CVX solution? Can CVX be set to give the very same solution? The question refers to Graphical LASSO algorithm, but can be extended to other similar problems in which authors suggest a specific algorithm (es. ADMM), but it is possible to find a solution with optimization packages.


Solution

  • Provided

    1. There is a unique solution (which should be true if your objective is regularized, which it appears to be by the last term.)
    2. There are no bugs in either implementation.

    Both implementations should return the exact same solution as the problem is a convex semidefinite program. The difference you should observe is

    1. Runtime, one will likely run longer than the other, I would bet your implementation uses a general-purpose solver package (CVX) so should be slower.
    2. Memory usage, once again, I would expect the general purpose (unutuned) package should consume more memory.
    3. Numerical stability, in general this some implementations will be much more numerically stable. That is, if you use a weak regularization (very small lambda) you may find that some implementations fail to converge while others still work.

    For small and toy problems this should not be a big deal (which is usually the case if you are an academic.) If you are a person trying to do something useful in the real world, runtime & memory usage tend to be extremely important, as they control what size problems you can tackle with your approach.

    The only way to know the relative limitations to each approach is to implement and try both! At the very least, I would implement and run both approaches as a sanity check that both implementations are likely correct (the chance of both implementations being incorrect and reporting the same results across a range on inputs is very low.)