pythonnumpyperformance

Bottleneck using np.where (in term of computing ressources i.e. memory and speed)


In the current example, I spent (many) hours in order to find a way to sort the M matrix so that it corresponds exactly to the target one.

For big arrays (hundreds thousands of lines, even millions), the amount of memory is huge and this step time is consuming (bottleneck).

# -*- coding: utf-8 -*-
import numpy as np

M = np.array([ [ 5  , 15 , 6 ],
               [ 14 , 15 , 14 ],
               [ 5  , 11 , 350 ],
               [ 5  , 11, 352 ],
               [ 5  , 11 , 351 ],
               [ 5  , 11 , 350 ],
               [ 9  , 11 , 351 ],
               [ 9  , 11 , 95 ],
               [ 9  , 11 , 353 ],
               [ 9  , 11 , 354 ],
               [ 28 , 15 , 28 ],
               [ 2  , 8 , 46 ],
               [ 2  , 8 , 353 ],
               [ 2  , 8 , 45 ],
               [ 21 , 15 , 21 ],
               [ 31 , 20 , 355 ],
               [ 31 , 20 , 358 ]])


matrixOrder = np.array([ [14, 15],
                         [2 , 8],
                         [31, 20],
                         [5 , 11],
                         [21, 15],
                         [9, 11],
                         [5, 15],
                         [28, 15] ])
                                           
Target = np.array([ [ 14 , 15 , 14 ],
                    [ 2 , 8 , 46 ],
                    [ 2 , 8 , 353 ],
                    [ 2 , 8 , 45 ],
                    [ 31 , 20 , 355 ],
                    [ 31 , 20 , 358 ],
                    [ 5 , 11 , 350 ],
                    [ 5 , 11 , 352 ],
                    [ 5 , 11 , 351 ],
                    [ 5 , 11 , 350 ],
                    [ 21 , 15 , 21 ],
                    [ 9 , 11 , 351 ],
                    [ 9 , 11 , 95 ],
                    [ 9 , 11 , 353 ],
                    [ 9 , 11 , 354 ],
                    [ 5 , 15 , 6 ],
                    [ 28 , 15 , 28 ] ])
                                        
index = np.where( (matrixOrder[:, 0].reshape(-1, 1) == M[:, 0].tolist()) &
                  (matrixOrder[:, 1].reshape(-1, 1) == M[:, 1].tolist()) )

T = M[index[1], :]
check = np.array_equal(Target, T)

Solution

  • Try doing a dictionary lookup instead of using np.where

    something like -

    def sort_mat(M, ord):
        # Convert order pairs to tuples
        k = [tuple(r) for r in ord]
        
        # Make lookup dictionary
        d = {k: i for i,k in enumerate(k)}
        
        # Get sorted indices
        idx = sorted(range(len(M)), key=lambda x: d.get((M[x,0], 
        M[x,1]), len(k)))
        
        #return the sorted matrix
        return M[idx]