pythonliststatisticsvariance

How can I calculate the variance of a list in python?


If I have a list like this:

results=[-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439,
          0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]

I want to calculate the variance of this list in Python which is the average of the squared differences from the mean.

How can I go about this? Accessing the elements in the list to do the computations is confusing me for getting the square differences.


Solution

  • You can use numpy's built-in function var:

    import numpy as np
    
    results = [-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439,
              0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]
    
    print(np.var(results))
    

    This gives you 28.822364260579157

    If - for whatever reason - you cannot use numpy and/or you don't want to use a built-in function for it, you can also calculate it "by hand" using e.g. a list comprehension:

    # calculate mean
    m = sum(results) / len(results)
    
    # calculate variance using a list comprehension
    var_res = sum((xi - m) ** 2 for xi in results) / len(results)
    

    which gives you the identical result.

    If you are interested in the standard deviation, you can use numpy.std:

    print(np.std(results))
    5.36864640860051
    

    @Serge Ballesta explained very well the difference between variance n and n-1. In numpy you can easily set this parameter using the option ddof; its default is 0, so for the n-1 case you can simply do:

    np.var(results, ddof=1)
    

    The "by hand" solution is given in @Serge Ballesta's answer.

    Both approaches yield 32.024849178421285.

    You can set the parameter also for std:

    np.std(results, ddof=1)
    5.659050201086865