rmatrixr-bigmemorylarge-data

Computing the null space of a bigmatrix in R


I can not find any function or package to calculate the null space or (QR decomposition) of a bigmatrix (from library(bigmemory)) in R. For example:

library(bigmemory)

a <- big.matrix(1000000, 1000, type='double', init=0)

I tried the following but got the errors shown. How can I find the null space of a bigmemory object?

a.qr <- Matrix::qr(a)
# Error in as.vector(data) : 
#   no method for coercing this S4 class to a vector
q.null <- MASS::Null(a)
# Error in as.vector(data) : 
#   no method for coercing this S4 class to a vector

Solution

  • If you want to compute the full SVD of the matrix, you can use package bigstatsr to perform computations by block. A FBM stands for a Filebacked Big Matrix and is an object similar to a filebacked big.matrix object of package bigmemory.

    library(bigstatsr)
    options(bigstatsr.block.sizeGB = 0.5)
    
    # Initialize FBM with random numbers
    a <- FBM(1e6, 1e3)
    big_apply(a, a.FUN = function(X, ind) {
      X[, ind] <- rnorm(nrow(X) * length(ind))
      NULL
    }, a.combine = 'c')
    
    # Compute t(a) * a
    K <- big_crossprodSelf(a, big_scale(center = FALSE, scale = FALSE))
    
    # Get v and d where a = u * d * t(v) the SVD of a
    eig <- eigen(K[])
    v <- eig$vectors
    d <- sqrt(eig$values)
    
    # Get u if you need it. It will be of the same size of u
    # so that I store it as a FBM.
    u <- FBM(nrow(a), ncol(a))
    big_apply(u, a.FUN = function(X, ind, a, v, d) {
      X[ind, ] <- sweep(a[ind, ] %*% v, 2, d, "/")
      NULL
    }, a.combine = 'c', block.size = 50e3, ind = rows_along(u),
    a = a, v = v, d = d)
    
    # Verification
    ind <- sample(nrow(a), 1000)
    all.equal(a[ind, ], tcrossprod(sweep(u[ind, ], 2, d, "*"), v))
    

    This takes approximately 10 minutes on my computer.