rvariancequadruple-precision

How do I replicate Stata's quadvariance() function for high-precision variance calculation in R?


In Stata, there's a function called quadvariance() which computes variance using quad precision (extended floating-point arithmetic) to reduce numerical errors, especially in datasets with very small variances or large values. This is useful when double-precision calculations (like those in most standard statistical functions) aren't stable enough.

I'm looking for an equivalent in R—either a built-in function or a package—that can compute variance with higher numerical precision than base R’s var(). Ideally, it would use something like 128-bit precision or arbitrary-precision arithmetic.

Does R have any packages that replicate this kind of functionality? Or is there a standard approach for computing high-precision variance in R?

https://www.stata.com/manuals/m-5mean.pdf

If it does not already exist, I'll probably make one if anyone would like to contribute!

I've already attempted to replicate the behavior of Stata’s quadvariance() by writing a custom R function that calculates the variance (or covariance matrix) using double precision (was not precise enough):

quadvariance <- function(x) {
  if (is.null(dim(x)) || nrow(x) <= 1) return(0)

  if (is.matrix(x)) {
    # For matrices, apply quadvariance to each column
    n <- nrow(x)
    result <- matrix(0, ncol(x), ncol(x))
    means <- colMeans(x)

    # Calculate covariance matrix with n divisor
    for (i in 1:ncol(x)) {
      for (j in 1:ncol(x)) {
        result[i, j] <- sum((x[,i] - means[i]) * (x[,j] - means[j])) / n
      }
    }
    return(result)
  } else {
    # For vectors
    n <- length(x)
    mean_x <- mean(x)
    return(sum((x - mean_x)^2) / n)
  }
}

Solution

  • The Rmpfr package should give you the basic machinery you need for this kind of arbitrary-precision computation:

    For the scalar example:

    set.seed(101) 
    x <- rnorm(20)
    myvar <- function(x, precision = 100) {
      require("Rmpfr")
      x <- mpfr(x, precision)
      n <- length(x)
      mean_x <- mean(x)
      return(sum((x - mean_x)^2) / (n-1))
    }
    

    (I changed the numerator to n-1 to match R's (unbiased) var() function ... you can obviously implement this with n instead of n-1 if you want ...)

    set.seed(101)
    x <- rnorm(20)
    var(x)
    ## [1] 0.7513151
    myvar(x)
    ## 1 'mpfr' number of precision  100   bits 
    ## [1] 0.7513151279952085783692884323204
    

    On to the multi-dimensional version. To my surprise, stuff like colMeans() and sweep() also appears to work!

    So, more generally (this will take either vector or matrix input):

    mycov <- function(x, precision = 100) {
       if (!is.matrix(x)) x <- matrix(x) 
       m <- mpfr(x, precision)
       mm <- sweep(m, MARGIN = 2, colMeans(m), "-")
       crossprod(mm)/(nrow(x)-1)
    }
    set.seed(101)
    x <- matrix(rnorm(300), ncol = 3)
    
    var(x)
                [,1]      [,2]         [,3]
    [1,]  0.872488576 0.1011235 -0.003926077
    [2,]  0.101123497 1.0083622  0.126702847
    [3,] -0.003926077 0.1267028  0.761815283
    
    mycov(x)
    'mpfrMatrix' of dim(.) =  (3, 3) of precision  100   bits 
        [,1]                                  [,2]                              
    [1,]    0.87248857636991865178270814151709 0.10112349716636467905459567728260
    [2,]    0.10112349716636467905459567728260  1.0083622310223205622106287494833
    [3,] -0.0039260769128924402797286663292517 0.12670284694556707823168252164396
         [,3]                                 
    [1,]   -0.0039260769128924402797286663292517
    [2,]    0.12670284694556707823168252164396
    [3,]    0.76181528299482132609996327757919