Can you help me understand the following:
import numpy as np
sorted([3,2,np.nan, -1])
output:
[-1, 2, 3, nan]
import numpy as np
sorted([1,2,np.nan, -1])
output:
[1, 2, nan, -1]
It is almost as if I need a sorting triggered pre-NaN otherwise it returns the same list
Well it has 2 things to consider. First, the sorted()
uses <
or __lt__()
for comparison. For a nan
value the result for this comparison with any number is always False.
>>> 1 > np.nan
False
>>> 1 < np.nan
False
This causes the comparison to be wrong, here is an example to understand it better :
>>> max(np.nan,1)
nan
>>> max(1,np.nan)
1
As you can see, the answer is also different when we call max()
function. because the result of the comparison is always False
and it has this simple logic :
max(a,b):
if a<b:
return b
else :
return a
Which clearly shows why max()
function fails to return the correct max value and it returns the first parameter as maximum value.
The same logic goes when you are using a comparison operator in the sorting algorithm. It affects the comparison in the timsort algorithm. If you want to study more about this issue, this github issue maybe helpful.