sortingnumpynumexpr

Parallel in-place sort for numpy arrays


I often need to sort large numpy arrays (few billion elements), which became a bottleneck of my code. I am looking for a way to parallelize it.

Are there any parallel implementations for the ndarray.sort() function? Numexpr module provides parallel implementation for most math operations on numpy arrays, but lacks sorting capabilities.

Maybe, it is possible to make a simple wrapper around a C++ implementation of parallel sorting, and use it through Cython?


Solution

  • I ended up wrapping GCC parallel sort. Here is the code:

    parallelSort.pyx

    # cython: wraparound = False
    # cython: boundscheck = False
    import numpy as np
    cimport numpy as np
    import cython
    cimport cython 
    
    ctypedef fused real:
        cython.char
        cython.uchar
        cython.short
        cython.ushort
        cython.int
        cython.uint
        cython.long
        cython.ulong
        cython.longlong
        cython.ulonglong
        cython.float
        cython.double
    
    cdef extern from "<parallel/algorithm>" namespace "__gnu_parallel":
        cdef void sort[T](T first, T last) nogil 
    
    def numpyParallelSort(real[:] a):
        "In-place parallel sort for numpy types"
        sort(&a[0], &a[a.shape[0]])
    

    Extra compiler args: -fopenmp (compile) and -lgomp (linking)

    This makefile will do it:

    all:
        cython --cplus parallelSort.pyx  
        g++  -g -march=native -Ofast -fpic -c    parallelSort.cpp -o parallelSort.o -fopenmp `python-config --includes`
        g++  -g -march=native -Ofast -shared  -o parallelSort.so parallelSort.o `python-config --libs` -lgomp 
    
    clean:
        rm -f parallelSort.cpp *.o *.so
    

    And this shows that it works:

    from parallelSort import numpyParallelSort
    import numpy as np 
    a = np.random.random(100000000)
    
    numpyParallelSort(a) 
    print a[:10]
    

    edit: fixed bug noticed in the comment below