In Stata, there's a function called quadvariance() which computes variance using quad precision (extended floating-point arithmetic) to reduce numerical errors, especially in datasets with very small variances or large values. This is useful when double-precision calculations (like those in most standard statistical functions) aren't stable enough.
I'm looking for an equivalent in R—either a built-in function or a package—that can compute variance with higher numerical precision than base R’s var(). Ideally, it would use something like 128-bit precision or arbitrary-precision arithmetic.
Does R have any packages that replicate this kind of functionality? Or is there a standard approach for computing high-precision variance in R?
https://www.stata.com/manuals/m-5mean.pdf
If it does not already exist, I'll probably make one if anyone would like to contribute!
I've already attempted to replicate the behavior of Stata’s quadvariance() by writing a custom R function that calculates the variance (or covariance matrix) using double precision (was not precise enough):
quadvariance <- function(x) {
if (is.null(dim(x)) || nrow(x) <= 1) return(0)
if (is.matrix(x)) {
# For matrices, apply quadvariance to each column
n <- nrow(x)
result <- matrix(0, ncol(x), ncol(x))
means <- colMeans(x)
# Calculate covariance matrix with n divisor
for (i in 1:ncol(x)) {
for (j in 1:ncol(x)) {
result[i, j] <- sum((x[,i] - means[i]) * (x[,j] - means[j])) / n
}
}
return(result)
} else {
# For vectors
n <- length(x)
mean_x <- mean(x)
return(sum((x - mean_x)^2) / n)
}
}
The Rmpfr
package should give you the basic machinery you need for this kind of arbitrary-precision computation:
For the scalar example:
set.seed(101)
x <- rnorm(20)
myvar <- function(x, precision = 100) {
require("Rmpfr")
x <- mpfr(x, precision)
n <- length(x)
mean_x <- mean(x)
return(sum((x - mean_x)^2) / (n-1))
}
(I changed the numerator to n-1
to match R's (unbiased) var()
function ... you can obviously implement this with n
instead of n-1
if you want ...)
set.seed(101)
x <- rnorm(20)
var(x)
## [1] 0.7513151
myvar(x)
## 1 'mpfr' number of precision 100 bits
## [1] 0.7513151279952085783692884323204
On to the multi-dimensional version. To my surprise, stuff like colMeans()
and sweep()
also appears to work!
So, more generally (this will take either vector or matrix input):
mycov <- function(x, precision = 100) {
if (!is.matrix(x)) x <- matrix(x)
m <- mpfr(x, precision)
mm <- sweep(m, MARGIN = 2, colMeans(m), "-")
crossprod(mm)/(nrow(x)-1)
}
set.seed(101)
x <- matrix(rnorm(300), ncol = 3)
var(x)
[,1] [,2] [,3]
[1,] 0.872488576 0.1011235 -0.003926077
[2,] 0.101123497 1.0083622 0.126702847
[3,] -0.003926077 0.1267028 0.761815283
mycov(x)
'mpfrMatrix' of dim(.) = (3, 3) of precision 100 bits
[,1] [,2]
[1,] 0.87248857636991865178270814151709 0.10112349716636467905459567728260
[2,] 0.10112349716636467905459567728260 1.0083622310223205622106287494833
[3,] -0.0039260769128924402797286663292517 0.12670284694556707823168252164396
[,3]
[1,] -0.0039260769128924402797286663292517
[2,] 0.12670284694556707823168252164396
[3,] 0.76181528299482132609996327757919