Efficient numpy submatrix view

I wish to apply Hungarian algorithm to many subsets of numpy matrix C indexed by cross products of lists row_ind, col_ind. Currently, I see the following options to do so:

Double slicing:

linear_sum_assignment(C[row_ind,:][:,col_ind])

Problem: two copies per subset operation.

Advanced slicing via np.ix_:

linear_sum_assignment(C[np.ix_(row_ind, col_ind)])

Problem: one copy per subset, np.ix_ is inefficient (allocates n x n matrix).

UPDATE: as noted by @hpaulj, np.ix_ doesn't it fact allocate n x n matrix, but it is somehow still slower than 1.

Masked array.

Problem: doesn't work with linear_sum_assignment.

So, no option is satisfying.

What is ideally desired is an ability to specify a submatrix view using the matrix C and a couple of unidimensional masks for rows and cols respectively, so such a view could be passed to linear_sum_assignment. For another linear_sum_assignment call, I would quickly adjust masks but never modify or copy/subset the full matrix.

Is there something similar already available in numpy?

What is the most efficient way (as little copies/memory allocations as possible) to process multiple submatrices of the same big matrix?

Solution

The different ways of indexing an array with a lists/arrays time about the same. They all produce copies, not views.

For example

In [99]: arr = np.ones((1000,1000),int)
In [100]: id1=np.arange(0,1000,10)
In [101]: id2=np.arange(0,1000,20)

In [105]: timeit arr[id1,:][:,id2].shape
52.5 µs ± 243 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [106]: timeit arr[np.ix_(id1,id2)].shape
66.5 µs ± 47.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In contrast if I use slices (in this case selecting the same elements), I get a view, which is much faster:

In [107]: timeit arr[::10,::20].shape
661 ns ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

ix_ doesn't create a (m,n) array; it returns a tuple of adjusted 1d arrays. It's the equivalent of

In [108]: timeit arr[id1[:,None], id2].shape
54.5 µs ± 1.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The timing difference is primarily due to an extra layer of function calls.

Your scipy link has a [source] link:

https://github.com/scipy/scipy/blob/v0.19.1/scipy/optimize/_hungarian.py#L13-L107

This optimize.linear_sum_assignment function creates a _Hungary object with the cost_matrix. That makes a copy, and solves the problem by searching and manipulating its values.

Using the documentation example:

In [110]: optimize.linear_sum_assignment(cost)
Out[110]: (array([0, 1, 2], dtype=int32), array([1, 0, 2], dtype=int32))

What it does is create a state object:

In [111]: H=optimize._hungarian._Hungary(cost)
In [112]: vars(H)
Out[112]: 
{'C': array([[4, 1, 3],
        [2, 0, 5],
        [3, 2, 2]]),
 'Z0_c': 0,
 'Z0_r': 0,
 'col_uncovered': array([ True,  True,  True], dtype=bool),
 'marked': array([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]]),
 'path': array([[0, 0],
        [0, 0],
        [0, 0],
        [0, 0],
        [0, 0],
        [0, 0]]),
 'row_uncovered': array([ True,  True,  True], dtype=bool)}

It iterates,

In [113]: step=optimize._hungarian._step1
In [114]: while step is not None:
     ...:     step = step(H)
     ...:

And the resulting state is:

In [115]: vars(H)
Out[115]: 
{'C': array([[1, 0, 1],
        [0, 0, 4],
        [0, 1, 0]]),
 'Z0_c': 0,
 'Z0_r': 1,
 'col_uncovered': array([False, False, False], dtype=bool),
 'marked': array([[0, 1, 0],
        [1, 0, 0],
        [0, 0, 1]]),
 'path': array([[1, 0],
        [0, 0],
        [0, 0],
        [0, 0],
        [0, 0],
        [0, 0]]),
 'row_uncovered': array([ True,  True,  True], dtype=bool)}

The solution is pulled from the marked array

In [116]: np.where(H.marked)
Out[116]: (array([0, 1, 2], dtype=int32), array([1, 0, 2], dtype=int32))

The total cost is the sum of these values:

In [122]: cost[np.where(H.marked)]
Out[122]: array([1, 2, 2])

But the cost from the C array in the final state is 0:

In [124]: H.C[np.where(H.marked)]
Out[124]: array([0, 0, 0])

So even if the submatrix that you give to optimize.linear_sum_assignment is a view, the search still involves a copy. The search space and time increases significantly with the size of this cost matrix.