algorithmdynamic-programminglinear-programminginteger-programminginteger-partition

Algorithm to find some rows from a matrix, whose sum is equal to a given row


For example, here is a matrix:

[1, 0, 0, 0],
[1, 1, 0, 0],
[1, 0, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 1],

I want to find some rows, whose sum is equal to [4, 3, 2, 1]. The expected answer is rows: {0,1,3,4}. Because:

[1, 0, 0, 0] + [1, 1, 0, 0] + [1, 1, 1, 0] + [1, 1, 1, 1] = [4, 3, 2, 1]

Is there some famous or related algrithoms to resolve the problem?

Thank @sascha and @N. Wouda for the comments. To clarify it, here I provide some more details.

In my problem, the matrix will have about 50 rows and 25 columns. But echo row will just have less than 4 elements (other is zero). And every solution has 8 rows.

If I try all combinations, c(8, 50) is about 0.55 billion times of attempt. Too complex. So I want to find a more effective algrithom.


Solution

  • If you want to make the jump to using a solver, I'd recommend it. This is a pretty straightforward Integer Program. Below solutions use python, python's pyomo math programming package to formulate the problem, and COIN OR's cbc solver for Integer Programs and Mixed Integer Programs, which needs to be installed separately (freeware) available: https://www.coin-or.org/downloading/

    Here is the an example with your data followed by an example with 100,000 rows. The example above solves instantly, the 100,000 row example takes about 2 seconds on my machine.

    # row selection Integer Program
    import pyomo.environ as pyo
    
    data1 = [   [1, 0, 0, 0],
                [1, 1, 0, 0],
                [1, 0, 1, 0],
                [1, 1, 1, 0],
                [1, 1, 1, 1],]
    
    
    data_dict = {(i, j): data1[i][j] for i in range(len(data1)) for j in range(len(data1[0]))}
    
    model = pyo.ConcreteModel()
    
    # sets
    model.I = pyo.Set(initialize=range(len(data1)))     # a simple row index
    model.J = pyo.Set(initialize=range(len(data1[0])))  # a simple column index
    
    # parameters
    model.matrix = pyo.Param(model.I , model.J, initialize=data_dict)   # hold the sparse matrix of values
    magic_sum = [4, 3, 2, 1 ]
    
    # variables
    model.row_select = pyo.Var(model.I, domain=pyo.Boolean) # row selection variable
    
    # constraints
    # ensure the columnar sum is at least the magic sum for all j
    def min_sum(model, j):
        return sum(model.row_select[i] * model.matrix[(i, j)] for i in model.I) >= magic_sum[j] 
    model.c1 = pyo.Constraint(model.J, rule=min_sum)
    
    # objective function
    # minimze the overage
    def objective(model):
        delta = 0
        for j in model.J:
            delta += sum(model.row_select[i] * model.matrix[i, j] for i in model.I) - magic_sum[j] 
    
        return delta
    
    model.OBJ = pyo.Objective(rule=objective)
    
    
    model.pprint()  # verify everything
    
    solver = pyo.SolverFactory('cbc')  # need to have cbc solver installed
    
    result = solver.solve(model)
    
    result.write()  # solver details
    
    model.row_select.display() # output
    

    Output:

    # ----------------------------------------------------------
    #   Solver Information
    # ----------------------------------------------------------
    Solver: 
    - Status: ok
      User time: -1.0
      System time: 0.0
      Wallclock time: 0.0
      Termination condition: optimal
      Termination message: Model was solved to optimality (subject to tolerances), and an optimal solution is available.
      Statistics: 
        Branch and bound: 
          Number of bounded subproblems: 0
          Number of created subproblems: 0
        Black box: 
          Number of iterations: 0
      Error rc: 0
      Time: 0.01792597770690918
    # ----------------------------------------------------------
    #   Solution Information
    # ----------------------------------------------------------
    Solution: 
    - number of solutions: 0
      number of solutions displayed: 0
    row_select : Size=5, Index=I
        Key : Lower : Value : Upper : Fixed : Stale : Domain
          0 :     0 :   1.0 :     1 : False : False : Boolean
          1 :     0 :   1.0 :     1 : False : False : Boolean
          2 :     0 :   0.0 :     1 : False : False : Boolean
          3 :     0 :   1.0 :     1 : False : False : Boolean
          4 :     0 :   1.0 :     1 : False : False : Boolean
    

    A more stressful rendition with 100,000 rows:

    # row selection Integer Program stress test
    import pyomo.environ as pyo
    import numpy as np
    
    # make a large matrix 100,000 x 8
    data1 = np.random.randint(0, 1000, size=(100_000, 8))
    # inject "the right answer into 3 rows"
    data1[42602]    = [8, 0, 0, 0, 0, 0, 0, 0 ]
    data1[3]        = [0, 0, 0, 0, 4, 3, 2, 1 ]
    data1[10986]    = [0, 7, 6, 5, 0, 0, 0, 0 ]
    
    data_dict = {(i, j): data1[i][j] for i in range(len(data1)) for j in range(len(data1[0]))}
    
    model = pyo.ConcreteModel()
    
    # sets
    model.I = pyo.Set(initialize=range(len(data1)))     # a simple row index
    model.J = pyo.Set(initialize=range(len(data1[0])))  # a simple column index
    
    # parameters
    model.matrix = pyo.Param(model.I , model.J, initialize=data_dict)   # hold the sparse matrix of values
    magic_sum = [8, 7, 6, 5, 4, 3, 2, 1 ]
    
    # variables
    model.row_select = pyo.Var(model.I, domain=pyo.Boolean) # row selection variable
    
    # constraints
    # ensure the columnar sum is at least the magic sum for all j
    def min_sum(model, j):
        return sum(model.row_select[i] * model.matrix[(i, j)] for i in model.I) >= magic_sum[j] 
    model.c1 = pyo.Constraint(model.J, rule=min_sum)
    
    # objective function
    # minimze the overage
    def objective(model):
        delta = 0
        for j in model.J:
            delta += sum(model.row_select[i] * model.matrix[i, j] for i in model.I) - magic_sum[j] 
    
        return delta
    
    model.OBJ = pyo.Objective(rule=objective)
    
    solver = pyo.SolverFactory('cbc')
    
    result = solver.solve(model)
    result.write()
    
    
    print('\n\n======== row selections =======')
    for i in model.I:
        if model.row_select[i].value > 0:
            print (f'row {i} selected')
    

    Output:

    # ----------------------------------------------------------
    #   Solver Information
    # ----------------------------------------------------------
    Solver: 
    - Status: ok
      User time: -1.0
      System time: 2.18
      Wallclock time: 2.61
      Termination condition: optimal
      Termination message: Model was solved to optimality (subject to tolerances), and an optimal solution is available.
      Statistics: 
        Branch and bound: 
          Number of bounded subproblems: 0
          Number of created subproblems: 0
        Black box: 
          Number of iterations: 0
      Error rc: 0
      Time: 2.800779104232788
    # ----------------------------------------------------------
    #   Solution Information
    # ----------------------------------------------------------
    Solution: 
    - number of solutions: 0
      number of solutions displayed: 0
    
    
    ======== row selections =======
    row 3 selected
    row 10986 selected
    row 42602 selected