rlinear-programminglpsolvelinear-optimization

How to speed up linear programming problem solution?


I have the following problem. I have n (typically n = 1000) data points (integers from {1,2,3}, so there are a lot of repeting numbers) and a real number d. I have to choose k<n points (k is given) which minimize the distance between the mean of those k points and point d. This can be expressed as a MILP problem (please see here).

I tried to solve that in R using lpSolve and Rglpk packages but it takes a lot of time to solve it (I tried to solve it for n = 100 points and the code has been running for 40 minutes already). I guess the issue is that there are lots of binary variables (n) and there are also repeating numbers.

library(Rglpk)
set.seed(123)
sampsize <- sample(c(1,2,3),size=100,replace = TRUE)
k <- 50
d <- 86/47
lngth <- length(sampsize)
f.obj <- c(1,rep(0,lngth))
f.con <- matrix(c(0,rep(1,lngth),-1,sampsize/k,1,sampsize/k),nrow=3, byrow = TRUE)
f.dir <- c("==","<=",">=")  
f.rhs <- c(k,d,d)

f.types <- c("C",rep("B",lngth))

res <- Rglpk_solve_LP(obj=f.obj,mat=f.con,dir=f.dir,rhs=f.rhs,max=FALSE,types=f.types)

I will be satisfied with a sub-optimal solution. Is there a way to solve it quickly or re-express the problem in a certain way to speed up the algorithm?

I would appreciate any input on this.


Solution

  • CVXR is a much better tool for this:

    #
    # generate random data
    #
    
    set.seed(123)
    N <- 100           # sample size
    v <- c(1,2,3)      # sample from these unique values
    M <- length(v)     # number of unique values
    
    data <- sample(v, size=N, replace=TRUE)
    tab <- table(data) # tabulate  
    
    K <- 50            # number of points to choose
    target <- 86/47    # target for mean 
    
    #
    #  CVXR model
    #  see https://cvxr.rbind.io/
    #
    library(CVXR)
    
    # select a number of values from each bin
    select <- Variable(M,integer=T) 
    
    # obj: sum of absolute deviations
    objective = Minimize(abs(sum(v*select)/K-target))
    # include nonnegativity constraints
    constraints = list(sum(select)==K, select >= 0, select <= vec(tab))
    problem <- Problem(objective, constraints)
    
    sol <- solve(problem,verbose=T)
    
    cat("\n")
    cat("Status:",sol$status,"\n")
    cat("Objective:",sol$value,"\n")
    cat("Solution:",sol$getValue(select),"\n")
    

    Output:

    GLPK Simplex Optimizer, v4.65
    9 rows, 4 columns, 17 non-zeros
          0: obj =   0.000000000e+00 inf =   5.183e+01 (2)
          3: obj =   5.702127660e-01 inf =   0.000e+00 (0)
    *     4: obj =   1.065814104e-16 inf =   0.000e+00 (0)
    OPTIMAL LP SOLUTION FOUND
    GLPK Integer Optimizer, v4.65
    9 rows, 4 columns, 17 non-zeros
    3 integer variables, none of which are binary
    Integer optimization begins...
    Long-step dual simplex will be used
    +     4: mip =     not found yet >=              -inf        (1; 0)
    +    55: >>>>>   1.021276596e-02 >=   9.787234043e-03   4.2% (52; 0)
    +    56: >>>>>   9.787234043e-03 >=   9.787234043e-03 < 0.1% (16; 36)
    +    56: mip =   9.787234043e-03 >=     tree is empty   0.0% (0; 103)
    INTEGER OPTIMAL SOLUTION FOUND
    
    Status: optimal 
    Objective: 0.009787234 
    Solution: 26 7 17