performancenumpyarrayaccess

Accessing a large numpy array while preserving its order


I would like to access an numpy array data via an index idx, but still preserving the order in data. Below is an example where the array is accessed with an order different from the one in the original array.

In [125]: data = np.array([2, 2.2, 2.5])

In [126]: idx=np.array([1,0])

In [127]: data[idx]
Out[127]: array([2.2, 2. ])

I hope to get [2,2.2] instead. Is there a highly efficient way to do so? In my problem setting, I have the data with more than a million floating-point numbers, and idx with a 0.1 million integers.

Important info: The array data can be preprocessed if needed. The data come from an image processing work. For example, if we need to sort data beforehand, the time consumed on sorting would not be considered when measuring the performance. On the other hands, idx is something I would rather not process too much at runtime as time spent on it has to be counted. E.g. soriting idx with an O(n log n) algorithm can be too expensive.


Solution

  • Creat a boolean 'mask'

     mask = np.zeros(data.shape, bool)
     mask[idx] = True
     res = data[mask]