numpystatisticsstdstdev

What is the difference between statistics.stdev() & numpy.std() and which is more precise?


I used this dataset:

lst = [81922.00557103065, 82887.70053475935, 80413.01627033792,
       81708.86075949368, 82997.38219895288, 84641.50943396226,
       81929.82456140351, 82632.24181360201, 77667.98418972333,
       73726.47427854454, 86113.2075471698, 83232.98429319372,
       79866.66666666667, 83833.74689826302, 81943.06930693069,
       77898.64029666255, 77401.47783251232, 80607.59493670886,
       78384.5126835781, 82608.69565217392, 80824.8730964467,
       84163.70106761566, 74887.38738738738
       ]

Then statistics.stdev(lst) is 3096.28 and numpy.std(lst) is 3028.23. The difference is about 2.2%.


Solution

  • They are calculating two slightly different things.

    The standard deviation is the square root of the variance. NumPy is using the sample variance, whereas statistics is adjusting this with Bessel's correction. This uses N – 1 instead of N in the calculation of the variance:

    arr = np.array(lst)
    var_ordinary = np.sum(abs(arr - arr.mean())**2) / arr.size
    var_bessel = np.sum(np.abs(arr - arr.mean())**2) / (arr.size - 1)
    

    From the statistics docs:

    This is the sample variance s² with Bessel’s correction, also known as variance with N-1 degrees of freedom. Provided that the data points are representative (e.g. independent and identically distributed), the result should be an unbiased estimate of the true population variance.