I wasn't sure if this should go in SO or some other .SE, so I will delete if this is deemed to be off-topic.
I have a vector and I'm trying to calculate the variance "by hand" (meaning based on the definition of variance but still performing the calculations in R) using the equation: V[X] = E[X^2] - E[X]^2
where E[X] = sum (x * f(x))
and E[X^2] = sum (x^2 * f(x))
However, my calculated variance is different from the var()
function that R has (which I was using to check my work). Why is the var()
function different? How is it calculating variance? I've checked my calculations several times so I'm fairly confident in the value I calculated. My code is provided below.
vec <- c(3, 5, 4, 3, 6, 7, 3, 6, 4, 6, 3, 4, 1, 3, 4, 4)
range(vec)
counts <- hist(vec + .01, breaks = 7)$counts
fx <- counts / (sum(counts)) #the pmf f(x)
x <- c(min(vec): max(vec)) #the values of x
exp <- sum(x * fx) ; exp #expected value of x
exp.square <- sum(x^2 * fx) #expected value of x^2
var <- exp.square - (exp)^2 ; var #calculated variance
var(vec)
This gives me a calculated variance of 2.234 but the var()
function says the variance is 2.383.
While V[X] = E[X^2] - E[X]^2 is the population variance (when the values in the vector are the whole population, not just a sample), the var
function calculates an estimator for the population variance (the sample variance).