How do I replicate Stata's quadvariance() function for high-precision variance calculation in R?

In Stata, there's a function called quadvariance() which computes variance using quad precision (extended floating-point arithmetic) to reduce numerical errors, especially in datasets with very small variances or large values. This is useful when double-precision calculations (like those in most standard statistical functions) aren't stable enough.

I'm looking for an equivalent in R—either a built-in function or a package—that can compute variance with higher numerical precision than base R’s var(). Ideally, it would use something like 128-bit precision or arbitrary-precision arithmetic.

Does R have any packages that replicate this kind of functionality? Or is there a standard approach for computing high-precision variance in R?

https://www.stata.com/manuals/m-5mean.pdf

If it does not already exist, I'll probably make one if anyone would like to contribute!

I've already attempted to replicate the behavior of Stata’s quadvariance() by writing a custom R function that calculates the variance (or covariance matrix) using double precision (was not precise enough):

quadvariance <- function(x) {
  if (is.null(dim(x)) || nrow(x) <= 1) return(0)

  if (is.matrix(x)) {
    # For matrices, apply quadvariance to each column
    n <- nrow(x)
    result <- matrix(0, ncol(x), ncol(x))
    means <- colMeans(x)

    # Calculate covariance matrix with n divisor
    for (i in 1:ncol(x)) {
      for (j in 1:ncol(x)) {
        result[i, j] <- sum((x[,i] - means[i]) * (x[,j] - means[j])) / n
      }
    }
    return(result)
  } else {
    # For vectors
    n <- length(x)
    mean_x <- mean(x)
    return(sum((x - mean_x)^2) / n)
  }
}

Solution

The Rmpfr package should give you the basic machinery you need for this kind of arbitrary-precision computation:

For the scalar example:

set.seed(101) 
x <- rnorm(20)
myvar <- function(x, precision = 100) {
  require("Rmpfr")
  x <- mpfr(x, precision)
  n <- length(x)
  mean_x <- mean(x)
  return(sum((x - mean_x)^2) / (n-1))
}

(I changed the numerator to n-1 to match R's (unbiased) var() function ... you can obviously implement this with n instead of n-1 if you want ...)

set.seed(101)
x <- rnorm(20)
var(x)
## [1] 0.7513151
myvar(x)
## 1 'mpfr' number of precision  100   bits 
## [1] 0.7513151279952085783692884323204

On to the multi-dimensional version. To my surprise, stuff like colMeans() and sweep() also appears to work!

So, more generally (this will take either vector or matrix input):

mycov <- function(x, precision = 100) {
   if (!is.matrix(x)) x <- matrix(x) 
   m <- mpfr(x, precision)
   mm <- sweep(m, MARGIN = 2, colMeans(m), "-")
   crossprod(mm)/(nrow(x)-1)
}
set.seed(101)
x <- matrix(rnorm(300), ncol = 3)

var(x)
            [,1]      [,2]         [,3]
[1,]  0.872488576 0.1011235 -0.003926077
[2,]  0.101123497 1.0083622  0.126702847
[3,] -0.003926077 0.1267028  0.761815283

mycov(x)
'mpfrMatrix' of dim(.) =  (3, 3) of precision  100   bits 
    [,1]                                  [,2]                              
[1,]    0.87248857636991865178270814151709 0.10112349716636467905459567728260
[2,]    0.10112349716636467905459567728260  1.0083622310223205622106287494833
[3,] -0.0039260769128924402797286663292517 0.12670284694556707823168252164396
     [,3]                                 
[1,]   -0.0039260769128924402797286663292517
[2,]    0.12670284694556707823168252164396
[3,]    0.76181528299482132609996327757919