I wish to apply Hungarian algorithm to many subsets of numpy matrix C
indexed by cross products of lists row_ind
, col_ind
. Currently, I see the following options to do so:
Double slicing:
linear_sum_assignment(C[row_ind,:][:,col_ind])
Problem: two copies per subset operation.
Advanced slicing via np.ix_
:
linear_sum_assignment(C[np.ix_(row_ind, col_ind)])
Problem: one copy per subset, np.ix_
is inefficient (allocates n x n matrix).
UPDATE: as noted by @hpaulj, np.ix_
doesn't it fact allocate n x n matrix, but it is somehow still slower than 1.
Problem: doesn't work with linear_sum_assignment
.
So, no option is satisfying.
What is ideally desired is an ability to specify a submatrix view using the matrix C
and a couple of unidimensional masks for rows and cols respectively, so such a view could be passed to linear_sum_assignment
. For another linear_sum_assignment
call, I would quickly adjust masks but never modify or copy/subset the full matrix.
Is there something similar already available in numpy?
What is the most efficient way (as little copies/memory allocations as possible) to process multiple submatrices of the same big matrix?
The different ways of indexing an array with a lists/arrays time about the same. They all produce copies, not views.
For example
In [99]: arr = np.ones((1000,1000),int)
In [100]: id1=np.arange(0,1000,10)
In [101]: id2=np.arange(0,1000,20)
In [105]: timeit arr[id1,:][:,id2].shape
52.5 µs ± 243 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [106]: timeit arr[np.ix_(id1,id2)].shape
66.5 µs ± 47.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In contrast if I use slices (in this case selecting the same elements), I get a view
, which is much faster:
In [107]: timeit arr[::10,::20].shape
661 ns ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
ix_
doesn't create a (m,n) array; it returns a tuple of adjusted 1d arrays. It's the equivalent of
In [108]: timeit arr[id1[:,None], id2].shape
54.5 µs ± 1.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The timing difference is primarily due to an extra layer of function calls.
Your scipy
link has a [source] link:
https://github.com/scipy/scipy/blob/v0.19.1/scipy/optimize/_hungarian.py#L13-L107
This optimize.linear_sum_assignment
function creates a _Hungary
object with the cost_matrix
. That makes a copy, and solves the problem by searching and manipulating its values.
Using the documentation example:
In [110]: optimize.linear_sum_assignment(cost)
Out[110]: (array([0, 1, 2], dtype=int32), array([1, 0, 2], dtype=int32))
What it does is create a state object:
In [111]: H=optimize._hungarian._Hungary(cost)
In [112]: vars(H)
Out[112]:
{'C': array([[4, 1, 3],
[2, 0, 5],
[3, 2, 2]]),
'Z0_c': 0,
'Z0_r': 0,
'col_uncovered': array([ True, True, True], dtype=bool),
'marked': array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]),
'path': array([[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0]]),
'row_uncovered': array([ True, True, True], dtype=bool)}
It iterates,
In [113]: step=optimize._hungarian._step1
In [114]: while step is not None:
...: step = step(H)
...:
And the resulting state is:
In [115]: vars(H)
Out[115]:
{'C': array([[1, 0, 1],
[0, 0, 4],
[0, 1, 0]]),
'Z0_c': 0,
'Z0_r': 1,
'col_uncovered': array([False, False, False], dtype=bool),
'marked': array([[0, 1, 0],
[1, 0, 0],
[0, 0, 1]]),
'path': array([[1, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0]]),
'row_uncovered': array([ True, True, True], dtype=bool)}
The solution is pulled from the marked
array
In [116]: np.where(H.marked)
Out[116]: (array([0, 1, 2], dtype=int32), array([1, 0, 2], dtype=int32))
The total cost is the sum of these values:
In [122]: cost[np.where(H.marked)]
Out[122]: array([1, 2, 2])
But the cost from the C
array in the final state is 0:
In [124]: H.C[np.where(H.marked)]
Out[124]: array([0, 0, 0])
So even if the submatrix that you give to optimize.linear_sum_assignment
is a view, the search still involves a copy. The search space and time increases significantly with the size of this cost matrix.