rrandom-forestr-mice

Error using random forest (MICE package) during imputation


I would like to use the method Random Forest to impute missing values. I have read some papers that claim that MICE random Forest perform better than parametric mice.

In my case, I already run a model for the default mice and got the results and played with them. However when I had a option for the method random forest, I got an error and I'm not sure why. I've seen some questions relating to errors with random forest and mice but those are not my cases. My variables have more than a single NA.

imp <- mice(data1, m=70, pred=quickpred(data1), method="pmm", seed=71152, printFlag=TRUE)
impRF <- mice(data1, m=70, pred=quickpred(data1), method="rf", seed=71152, printFlag=TRUE)

iter imp variable
 1   1  Vac
Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero

Any one has any idea why I'm getting this error?

EDIT

I tried to change all variables to numeric instead of having dummy variables and it returned the same error and some warnings()

impRF <- mice(data, m=70, pred=quickpred(data), method="rf", seed=71152, printFlag=TRUE)

 iter imp variable
   1   1  Vac  CliForm
 Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero
 In addition: There were 50 or more warnings (use warnings() to see the first 50)

 50: In randomForest.default(x = xobs, y = yobs, ntree = 1,  ... :
   The response has five or fewer unique values.  Are you sure you want to do regression?

EDIT1

I've tried only with 5 imputations and a smaller subset of the data, with only 2000 rows and got a few different errors:

> imp <- mice(data2, m=5, pred=quickpred(data2), method="rf", seed=71152, printFlag=TRUE)

iter imp variable
 1   1  Vac  Radio  Origin  Job  Alc  Smk  Drugs  Prison  Commu  Hmless  Symp
Error in randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : NAs in foreign   
 function call (arg 11)
 In addition: Warning messages:
 1: In randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : invalid mtry: reset to within valid range
 2: In max(ncat) : no non-missing arguments to max; returning -Inf
 3: In randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : NAs introduced by coercion

Solution

  • I also encountered this error when I had only one fully observed variable, which I'm guessing is the cause in your case too. My colleague Anoop Shah provided me with a fix (below) and Prof van Buuren (mice's author) has said he will include it in the next update of the package.

    In R type the following to enable you to redefine the rf impute function. fixInNamespace("mice.impute.rf", "mice")

    The corrected function to paste in is then:

    mice.impute.rf <- function (y, ry, x, ntree = 100, ...){
    ntree <- max(1, ntree)
    xobs <- as.matrix(x[ry, ])
    xmis <- as.matrix(x[!ry, ])
    yobs <- y[ry]
    onetree <- function(xobs, xmis, yobs, ...) {
        fit <- randomForest(x = xobs, y = yobs, ntree = 1, ...)
        leafnr <- predict(object = fit, newdata = xobs, nodes = TRUE)
        nodes <- predict(object = fit, newdata = xmis, nodes = TRUE)
        donor <- lapply(nodes, function(s) yobs[leafnr == s])
        return(donor)
    }
    forest <- sapply(1:ntree, FUN = function(s) onetree(xobs, 
        xmis, yobs, ...))
    impute <- apply(forest, MARGIN = 1, FUN = function(s) sample(unlist(s), 
        1))
    return(impute)
    }