I am trying to reproduce the results from this paper (Weinberger and Saul, 2004, doi/10.5555/1597348.1597471), in which the authors use Maximum Variance Unfolding (MVU) to learn a low dimensional representation of a few distributions. The general problem of MVU, after dropping the rank constraint, is as follows:
Here, G
is the inner product matrix of the data. Each k
corresponds to a data point, and each l
corresponds to one of its neighbors. Then, the e
vectors are "indicator vectors" that have zeros in all entries except for the one in the index in the subscript. The second constraint zero-centers the data; I ignored this constraint in my formulation so as to keep it as simple as possible. Finally, G
must be positive semidefinite.
I am trying to replicate the experiments in the paper above on a dataset which consists of 400 images (76x101 pixels) of the same progressively rotated teapot. I start by importing the data and relevant libraries:
import scipy.io
import cvxpy as cvx
import numpy as np
from sklearn.neighbors import NearestNeighbors
data = scipy.io.loadmat('teapots.mat')
data = data["Input"][0][0][0]
# make it (400 x 23028)
data = data.T.astype(np.float64)
Then, I build an adjacency matrix representation of the k-nearest neighbors of each data point:
n_points = data.shape[0]
# create sparse matrix (400 x 400) with the connections
nn = NearestNeighbors(n_neighbors=4).fit(data)
nn = nn.kneighbors_graph(data).todense()
nn = np.array(nn)
Finally, I try to perform MVU to discover low dimensional embeddings for these data using CVXPY:
# inner product matrix of the original data
X = cvx.Constant(data.dot(data.T))
# inner product matrix of the projected points; PSD constrains G to be both PSD and symmetric
G = cvx.Variable((n_points, n_points), PSD=True)
G.value = np.zeros((n_points, n_points))
# spread out points in target manifold
objective = cvx.Maximize(cvx.trace(G))
constraints = []
# add distance-preserving constraints
for i in range(n_points):
for j in range(n_points):
if nn[i, j] == 1:
constraints.append(
(X[i, i] - 2 * X[i, j] + X[j, j]) -
(G[i, i] - 2 * G[i, j] + G[j, j]) == 0
)
problem = cvx.Problem(objective, constraints)
problem.solve(verbose=True, max_iters=10000000)
print(problem.status)
However, CVXPY tells me the problem is infeasible
. If I remove the constraints, the solver says the problem is unbounded
, which makes sense because I'm maximizing a convex function of the decision variable. My question, then, is what mistake(s) I am making in my formulation of MVU. Thank you in advance!
I omitted, in the code above, a verification that the adjacency graph is not disconnected. To do that, I simply perform a DFS on the adjacency graph and assert that the number of visited nodes is equal to the number of rows in the data. For n=4
neighbors, this holds.
This is a very hard type of SDP problem actually because you have a lot of near linear dependencies. Even the best implementations will run into some sort of numerical issues and will require reduced tolerances or other tricks. It really is a bit nasty.
Regarding your code:
X
is not of order 1e+8. It makes the solver immediately go numerically bananas. For example divide by 1e+6. That advice would apply to any problem, actually.sum(G)=0
. Without it it can be unbounded and the solver will also go bananas, just a bit later. You can see huge values appearing in the SCS log output.With these changes I was able to get SCS to start converging. After a while it looked like
8500| 1.01e+00 1.11e-04 6.31e+00 -1.64e+06 5.49e-04 6.15e+02
8750| 1.01e+00 1.11e-04 6.39e+00 -1.64e+06 5.49e-04 6.33e+02
I will not wait for it to finish though. You may need to fiddle with tolerances to relax them if you want it to terminate.
I would advise you to use cvx.MOSEK
(where, disclamer, I work) as a solver, but it seems the CVXPY compilation procedure to MOSEK runs into some issues and starts eating a lot of RAM taking forever. So instead here is a direct implementation in MOSEK Fusion API (with the rescaling and normalizing constraint):
from mosek.fusion import *
import scipy.io
import numpy as np
from sklearn.neighbors import NearestNeighbors
import sys
data = scipy.io.loadmat('teapots.mat')
data = data["Input"][0][0][0]
# make it (400 x 23028)
data = data.T.astype(np.float64)
n_points = data.shape[0]
# create sparse matrix (400 x 400) with the connections
nn = NearestNeighbors(n_neighbors=4).fit(data)
nn = nn.kneighbors_graph(data).todense()
nn = np.array(nn)
# inner product matrix of the original data
X = data.dot(data.T)/10**6
print(X)
# inner product matrix of the projected points; PSD constrains G to be both PSD and symmetric
M = Model()
G = M.variable(Domain.inPSDCone(n_points))
M.constraint(Expr.sum(G), Domain.equalsTo(0))
# spread out points in target manifold
M.objective(ObjectiveSense.Maximize, Expr.sum(G.diag()))
# add distance-preserving constraints
for i in range(n_points):
for j in range(n_points):
if nn[i, j] == 1:
M.constraint(Expr.add([G.index([i,i]),G.index([j,j]),Expr.mul(-2,G.index([i,j]))]), Domain.equalsTo(X[i, i] - 2 * X[i, j] + X[j, j]))
# Print log
M.setLogHandler(sys.stdout)
# Reduce accuracy
M.setSolverParam("intpntCoTolRelGap", 1e-5)
M.setSolverParam("intpntCoTolPfeas",1e-5)
M.setSolverParam("intpntCoTolDfeas",1e-5)
#Solve
M.solve()
Gresult = G.level().reshape((n_points,n_points))
print(Gresult)
The solution looks OK:
Interior-point solution summary
Problem status : PRIMAL_AND_DUAL_FEASIBLE
Solution status : OPTIMAL
Primal. obj: 1.5209452618e+06 nrm: 4e+03 Viol. con: 2e-02 var: 0e+00 barvar: 0e+00
Dual. obj: 1.5209449262e+06 nrm: 8e+03 Viol. con: 0e+00 var: 0e+00 barvar: 1e-04