rmlogit

R mlogit package: use LAPACK instead of LINPACK


I am estimating a fairly simple McFadden choice model using a very large data set (101.6 million unit-alternatives). I can estimate this model just fine in Stata using the asclogit command, but when I try to use the mlogit package in R, I get the following error:

region1 <- mlogit(chosen ~ mean_log.wage + mean_log.rent + bornNear + Dim.1 + regionFE | 0,
                  shape= "long", chid.var = "chid", alt.var = "alternatives", data = ready)

Error in qr.default(na.omit(X)) : too large a matrix for LINPACK
Calls: mlogit ... model.matrix -> model.matrix.mFormula -> qr -> qr.default

If I look at the source code of qr.R it's clear that the number of elements in my design matrix is too big relative to the LINPACK limit of 2,147,483,647. However, no such limit exists for LAPACK (that I can tell, at least).

From qr.R:

qr.default <- function(x, tol = 1e-07, LAPACK = FALSE, ...)
{
    x <- as.matrix(x)
    if(is.complex(x))
        return(structure(.Internal(La_qr_cmplx(x)), class = "qr"))
    ## otherwise :
    if(LAPACK)
        return(structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr"))
    ## else "Linpack" case:
    p <- as.integer(ncol(x))
    if(is.na(p)) stop("invalid ncol(x)")
    n <- as.integer(nrow(x))
    if(is.na(n)) stop("invalid nrow(x)")
    if(1.0 * n * p > 2147483647) stop("too large a matrix for LINPACK")
    ...

qr() appears to be called in the mFormula method of mlogit, when model.matrix is being created, and probably while checking NAs. But I can't tell if there is a way to pass LAPACK = TRUE to mlogit, or if there is a way to skip the NA checking.

I'm hoping @YvesCroissant will see this.

As I mentioned, I can estimate this model just fine in Stata, so it's not a question of resources. My Stata license is not portable, however, which is why I would like to use R.


Solution

  • Thanks to Julius' comment and this post on namespaces in R, I figured out the answer. I added the following code right after my library statements:

    source("mymFormula.R")
    tmpfun <- get("model.matrix.mFormula", envir = asNamespace("mlogit"))
    environment(mymFormula) <- environment(tmpfun)
    attributes(mymFormula) <- attributes(tmpfun)  # don't know if this is really needed
    assignInNamespace("model.matrix.mFormula", mymFormula, ns="mlogit")
    

    mymFormula.R is an R script where I copy/pasted the contents of mlogit:::model.matrix.mFormula and added mymFormula <- before the function invocation at the top of the file.

    I viewed the contents of mlogit:::model.matrix.mFormula by typing trace(mlogit:::model.matrix.mFormula, edit=TRUE) in RStudio. (Thanks to this answer for help on how to do that.)