I am working on an optimization problem with a huge number of variables (upwards of hundreds of millions). Each of them should be a 0-1 binary variable.
I can write it in the form (maximize x'Qx) where Q is positive semi-definite, and I am using Julia, so the package COSMO.jl seems like a great fit. However, there is a ton of sparsity in my problem. Q is 0 except on approximately sqrt(|Q|) entries, and for the constraints there are approximately sqrt(|Q|) linear constraints on the variables.
I can describe this system pretty easily using SparseArrays, but it appears the most natural way to input problems into COSMO uses standard arrays. Is there a way I can take advantage of the sparsity in this massive problem?
While there is no sample code in your perhaps this could help:
JuMP
works with sparse arrays so perhaps the easiest thing could be just use it in the construction of the goal function:
julia> using JuMP, SparseArrays, COSMO
julia> m = Model(with_optimizer(COSMO.Optimizer));
julia> q = sprand(Bool, 20, 20,0.05) # for readability I use a binary q
20×20 SparseMatrixCSC{Bool, Int64} with 21 stored entries:
⠀⠀⠀⡔⠀⠀⠀⠀⡀⠀
⠀⠀⠂⠀⠠⠀⠀⠈⠑⠀
⠀⠀⠀⠀⠀⠤⠀⠀⠀⠀
⠀⢠⢀⠄⠆⠀⠂⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠄⠀⠀⠌
julia> @variable(m, x[1:20], Bin);
julia> x'*q*x
x[1]*x[14] + x[14]*x[3] + x[15]*x[8] + x[16]*x[5] + x[18]*x[4] + x[18]*x[13] + x[19]*x[14] + x[20]*x[11]
You can see that the equation gets correctly reduced.
Indeed you could check the performance with a very sparse q
having 100M elements:
julia> q = sprand(10000, 10000,0.000001)
10000×10000 SparseMatrixCSC{Float64, Int64} with 98 stored entries:
...
julia> @variable(m,z[1:10000], Bin);
julia> @btime $z'*$q*$z
1.276 ms (51105 allocations: 3.95 MiB)
You can see that you are just getting the expected performance when constructing the goal function.