I have a pandas Series s
, and when I call s.std(skipna=True)
and s.std(skipna=False)
I get different results even when there are no NaN/null values in s
, why? Did I misunderstand the skipna
parameter? I'm using pandas 1.3.4
import pandas as pd
s = pd.Series([10.0]*4800000, index=range(4800000), dtype="float32")
# No NaN/null in the Series
print(s.isnull().any()) # False
print(s.isna().any()) # False
# Why the code below prints different results?
print(s.std(skipna=False)) # 0.0
print(s.std(skipna=True)) # 0.61053276
This is an issue with the Bottleneck optional dependency, used to accelerate some NaN-related routines. I think the wrong result happens due to loss of precision while calculating the mean, since Bottleneck uses naive summation, while NumPy uses more accurate pairwise summation.
You can disable Bottleneck with
pd.set_option('compute.use_bottleneck', False)
to fall back to the NumPy handling.