pythonnumpysortingnan

Understanding sorted in the presense of NaN - Python


Can you help me understand the following:

import numpy as np
sorted([3,2,np.nan, -1])
output:
[-1, 2, 3, nan]
import numpy as np
sorted([1,2,np.nan, -1])
output:
[1, 2, nan, -1]

It is almost as if I need a sorting triggered pre-NaN otherwise it returns the same list


Solution

  • Well it has 2 things to consider. First, the sorted() uses < or __lt__() for comparison. For a nan value the result for this comparison with any number is always False.

    >>> 1 > np.nan
    False
    >>> 1 < np.nan
    False
    

    This causes the comparison to be wrong, here is an example to understand it better :

    >>> max(np.nan,1)
    nan
    >>> max(1,np.nan)
    1
    

    As you can see, the answer is also different when we call max() function. because the result of the comparison is always False and it has this simple logic :

    max(a,b):
        if a<b:
            return b
        else :
            return a
    

    Which clearly shows why max() function fails to return the correct max value and it returns the first parameter as maximum value.

    The same logic goes when you are using a comparison operator in the sorting algorithm. It affects the comparison in the timsort algorithm. If you want to study more about this issue, this github issue maybe helpful.