If I have a list like this:
results=[-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439,
0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]
I want to calculate the variance of this list in Python which is the average of the squared differences from the mean.
How can I go about this? Accessing the elements in the list to do the computations is confusing me for getting the square differences.
You can use numpy's built-in function var
:
import numpy as np
results = [-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439,
0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]
print(np.var(results))
This gives you 28.822364260579157
If - for whatever reason - you cannot use numpy
and/or you don't want to use a built-in function for it, you can also calculate it "by hand" using e.g. a list comprehension:
# calculate mean
m = sum(results) / len(results)
# calculate variance using a list comprehension
var_res = sum((xi - m) ** 2 for xi in results) / len(results)
which gives you the identical result.
If you are interested in the standard deviation, you can use numpy.std:
print(np.std(results))
5.36864640860051
@Serge Ballesta explained very well the difference between variance n
and n-1
. In numpy you can easily set this parameter using the option ddof
; its default is 0
, so for the n-1
case you can simply do:
np.var(results, ddof=1)
The "by hand" solution is given in @Serge Ballesta's answer.
Both approaches yield 32.024849178421285
.
You can set the parameter also for std
:
np.std(results, ddof=1)
5.659050201086865