numpyscipyhungarian-algorithm

Efficient numpy submatrix view


I wish to apply Hungarian algorithm to many subsets of numpy matrix C indexed by cross products of lists row_ind, col_ind. Currently, I see the following options to do so:

  1. Double slicing:

    linear_sum_assignment(C[row_ind,:][:,col_ind])
    

Problem: two copies per subset operation.

  1. Advanced slicing via np.ix_:

    linear_sum_assignment(C[np.ix_(row_ind, col_ind)])
    

Problem: one copy per subset, np.ix_ is inefficient (allocates n x n matrix).

UPDATE: as noted by @hpaulj, np.ix_ doesn't it fact allocate n x n matrix, but it is somehow still slower than 1.

  1. Masked array.

Problem: doesn't work with linear_sum_assignment.

So, no option is satisfying.

What is ideally desired is an ability to specify a submatrix view using the matrix C and a couple of unidimensional masks for rows and cols respectively, so such a view could be passed to linear_sum_assignment. For another linear_sum_assignment call, I would quickly adjust masks but never modify or copy/subset the full matrix.

Is there something similar already available in numpy?

What is the most efficient way (as little copies/memory allocations as possible) to process multiple submatrices of the same big matrix?


Solution

  • The different ways of indexing an array with a lists/arrays time about the same. They all produce copies, not views.

    For example

    In [99]: arr = np.ones((1000,1000),int)
    In [100]: id1=np.arange(0,1000,10)
    In [101]: id2=np.arange(0,1000,20)
    
    In [105]: timeit arr[id1,:][:,id2].shape
    52.5 µs ± 243 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    In [106]: timeit arr[np.ix_(id1,id2)].shape
    66.5 µs ± 47.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    

    In contrast if I use slices (in this case selecting the same elements), I get a view, which is much faster:

    In [107]: timeit arr[::10,::20].shape
    661 ns ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    

    ix_ doesn't create a (m,n) array; it returns a tuple of adjusted 1d arrays. It's the equivalent of

    In [108]: timeit arr[id1[:,None], id2].shape
    54.5 µs ± 1.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    

    The timing difference is primarily due to an extra layer of function calls.

    Your scipy link has a [source] link:

    https://github.com/scipy/scipy/blob/v0.19.1/scipy/optimize/_hungarian.py#L13-L107

    This optimize.linear_sum_assignment function creates a _Hungary object with the cost_matrix. That makes a copy, and solves the problem by searching and manipulating its values.

    Using the documentation example:

    In [110]: optimize.linear_sum_assignment(cost)
    Out[110]: (array([0, 1, 2], dtype=int32), array([1, 0, 2], dtype=int32))
    

    What it does is create a state object:

    In [111]: H=optimize._hungarian._Hungary(cost)
    In [112]: vars(H)
    Out[112]: 
    {'C': array([[4, 1, 3],
            [2, 0, 5],
            [3, 2, 2]]),
     'Z0_c': 0,
     'Z0_r': 0,
     'col_uncovered': array([ True,  True,  True], dtype=bool),
     'marked': array([[0, 0, 0],
            [0, 0, 0],
            [0, 0, 0]]),
     'path': array([[0, 0],
            [0, 0],
            [0, 0],
            [0, 0],
            [0, 0],
            [0, 0]]),
     'row_uncovered': array([ True,  True,  True], dtype=bool)}
    

    It iterates,

    In [113]: step=optimize._hungarian._step1
    In [114]: while step is not None:
         ...:     step = step(H)
         ...:     
    

    And the resulting state is:

    In [115]: vars(H)
    Out[115]: 
    {'C': array([[1, 0, 1],
            [0, 0, 4],
            [0, 1, 0]]),
     'Z0_c': 0,
     'Z0_r': 1,
     'col_uncovered': array([False, False, False], dtype=bool),
     'marked': array([[0, 1, 0],
            [1, 0, 0],
            [0, 0, 1]]),
     'path': array([[1, 0],
            [0, 0],
            [0, 0],
            [0, 0],
            [0, 0],
            [0, 0]]),
     'row_uncovered': array([ True,  True,  True], dtype=bool)}
    

    The solution is pulled from the marked array

    In [116]: np.where(H.marked)
    Out[116]: (array([0, 1, 2], dtype=int32), array([1, 0, 2], dtype=int32))
    

    The total cost is the sum of these values:

    In [122]: cost[np.where(H.marked)]
    Out[122]: array([1, 2, 2])
    

    But the cost from the C array in the final state is 0:

    In [124]: H.C[np.where(H.marked)]
    Out[124]: array([0, 0, 0])
    

    So even if the submatrix that you give to optimize.linear_sum_assignment is a view, the search still involves a copy. The search space and time increases significantly with the size of this cost matrix.