pythonnumpynumpy-ndarraynumpy-ufunc

np.argsort() implementation is not found


I would like to see how numpy.argsort() works.

  1. In the documentation, the source for numpy.argsort() is numpy.core.fromnumeric.py. This is understandable. https://numpy.org/doc/stable/reference/generated/numpy.argsort.html

  2. core.fromnumeric.argsort() is a bit more complicated.
    Ignoring decorators, if fromnumeric.argsort(arr) returns _wrapfunc(arr, "argsort"), which returns arr.argsort(). This is not a problem.
    Assuming arr is numpy.ndarray, it might be in array_api.__init__.py. https://github.com/numpy/numpy/blob/v1.21.0/numpy/core/fromnumeric.py

  3. array_api.argsort() is from array_api._sorting_functions.argsort(). OK.https://github.com/numpy/numpy/blob/main/numpy/array_api/__init__.py

  4. _sorting_functions.argsort() calls numpy.argsort(). That is what I was looking for at first. It is circular. https://github.com/numpy/numpy/blob/main/numpy/array_api/_sorting_functions.py

Extra

  1. In numpy.__init__.pyi, numpy.argsort() is from core.fromnumeric https://github.com/numpy/numpy/blob/main/numpy/__init__.pyi

    1. and 5. are the same thing.

Are these circular references? Of course I know these work. Is it might be in array_api.__init__.py. in 2. wrong? So where is the actual location of its implementation?


Background on this issue

I noticed that np.unique is slow when return_index=True. I wanted to run np.unique on the sorted array, but found that np.unique calls np.argsort. So I tried to find out the difference between np.argsort and np.sort and needed to know more about np.argsort.


Solution

  • Why do you want to see the source? To implement it in your own c code project? I don't think it will help you use it more effectively in python. In an Ipython session I use ??

    In [22]: np.argsort??
    ...
    return _wrapfunc(a, 'argsort', axis=axis, kind=kind, order=order)
    

    OK, that's the typical case of a function passing the buck to the method. The function version will convert the input to array if necessary, and then call the array's method. Typically the function version has a more complete documentation, but the functionality is basically the same.

    In [21]: arr.argsort??
    Type:      builtin_function_or_method
    

    Usually that's the end of the story.

    The other route is to click the [source] link on the documentation. Here that leads to the same thing.

    Notice:

    @array_function_dispatch(_argsort_dispatcher)
    

    recent versions have added this dispatch layer; check the release notes for more details. In my experience that just makes searching for code harder.

    The other step is to go to github and do a search. Sometimes that turns up some useful bit, but often it's a wild-goose-chase.

    As a user I don't need to know the "how" details. It's easy enough to read the docs, and then do some experiments if I still have questions. Digging into the c code will not help be use it better.

    As for your added question:

    All ndarray objects are "multiarray", with anything from 0 to 32 dimensions.

    github

    On numpy github I searched for argsort, and chose the most promising file, numpy/core/src/multiarray/methods.c

    This has function

    array_argsort(PyArrayObject *self,
            PyObject *const *args, Py_ssize_t len_args, PyObject *kwnames)
    

    Skipping over code that appears to handle the input arguments, it looks the work is done in the

    res = PyArray_ArgSort(self, axis, sortkind);
    

    That appears to be defined in numpy/core/src/multiarray/item_selection.c

     PyArray_ArgSort(PyArrayObject *op, int axis, NPY_SORTKIND which)
     ...
     if (argsort == NULL) {
        if (PyArray_DESCR(op)->f->compare) {
            switch (which) {
                default:
                case NPY_QUICKSORT:
                    argsort = npy_aquicksort;
                    break;
                case NPY_HEAPSORT:
                    argsort = npy_aheapsort;
                    break;
                case NPY_STABLESORT:
                    argsort = npy_atimsort;
                    break;
       ...
       ret = _new_argsortlike(op2, axis, argsort, NULL, NULL, 0);
    

    and so on ....

    None of that helps me use it any better.