pythonnumpynumpy-slicingmatrix-indexing

Using numpy, what is a good way to index into a matrix using another matrix whose entry values are column numbers?


Suppose I want to indepedently re-order each row of a matrix. Here is an example of that using np.argsort():

>>> A
array([[88, 44, 77, 33, 77],
       [33, 55, 66, 88,  0],
       [88,  0,  0, 55, 88],
       [ 0, 22, 44, 88, 33],
       [33, 33, 77, 66, 66]])

>>> ind = np.argsort(A); ind
array([[3, 1, 2, 4, 0],
       [4, 0, 1, 2, 3],
       [1, 2, 3, 0, 4],
       [0, 1, 4, 2, 3],
       [0, 1, 3, 4, 2]])

>>> np.array([A[i][ind[i]] for i in range(ind.shape[0])])
array([[33, 44, 77, 77, 88],
       [ 0, 33, 55, 66, 88],
       [ 0,  0, 55, 88, 88],
       [ 0, 22, 33, 44, 88],
       [33, 33, 66, 66, 77]])

The last expression above (the one that uses range()) is one solution to my problem. My question is: Is there a better way to do this?

The inputs are two matrices like A and ind, both 2-dimensional and of the same size. The matrix A can have any values, and the values of ind are interepreted to be column indexes within A. The output is a new matrix the same size as ind whose values are from A as per the above expression: np.array([A[i][ind[i]] for i in range(ind.shape[0])]).

Each row of ind corresponds to the same row in A. Entry B[i,j] of the output comes from entry A[i, ind[i, j]] of the input.

Note that ind may have fewer columns than A, and I would like to support that case.

I'm asking because my solution (the given expression) is essentially using a for loop, and maybe numpy can do this more quickly using some internal loop. For my application, speed is important, so it would be nice if I can be time-efficient.


Solution

  • np.take_along_axiswas added (recently) to facilitate this kind of indexing. I believe its docs mention its use withargsort`:

    In [113]: A=np.array([[88, 44, 77, 33, 77],
         ...:        [33, 55, 66, 88,  0],
         ...:        [88,  0,  0, 55, 88],
         ...:        [ 0, 22, 44, 88, 33],
         ...:        [33, 33, 77, 66, 66]])
         ...: 
         ...: ind=np.array([[3, 1, 2, 4, 0],
         ...:        [4, 0, 1, 2, 3],
         ...:        [1, 2, 3, 0, 4],
         ...:        [0, 1, 4, 2, 3],
         ...:        [0, 1, 3, 4, 2]])
    
    In [114]: np.take_along_axis(A, np.argsort(A),1)
    Out[114]: 
    array([[33, 44, 77, 77, 88],
           [ 0, 33, 55, 66, 88],
           [ 0,  0, 55, 88, 88],
           [ 0, 22, 33, 44, 88],
           [33, 33, 66, 66, 77]])
    

    I can do the same with advanced indexing, replacing the range iteration with np.arange.

    In [119]: A[np.arange(5),ind.T].T
    Out[119]: 
    array([[33, 44, 77, 77, 88],
           [ 0, 33, 55, 66, 88],
           [ 0,  0, 55, 88, 88],
           [ 0, 22, 33, 44, 88],
           [33, 33, 66, 66, 77]]
    

    I had to add a couple of transpose the get the desired output and layout. I haven't taken time to figure out just why those are needed.

    Applying the argsort to axis 0 can be done with these two:

    In [123]: A[ind,np.arange(5)]
    Out[123]: 
    array([[ 0, 55,  0, 66, 77],
           [33, 44, 66, 55, 33],
           [33,  0, 44, 33, 66],
           [88, 55, 77, 55, 33],
           [88, 55, 44, 66, 88]])
    
    In [124]: np.take_along_axis(A, np.argsort(A),0)
    Out[124]: 
    array([[ 0, 55,  0, 66, 77],
           [33, 44, 66, 55, 33],
           [33,  0, 44, 33, 66],
           [88, 55, 77, 55, 33],
           [88, 55, 44, 66, 88]])