ranalysisestimationlongitudinal

Why is observation.weights not letting me use my weight variable in ltmle?


I am using the ltmle and parallel functions to run an analysis on a list of many imputed dataframes on a remote cluster. Here is my code:

set.seed(500, kind = "L'Ecuyer-CMRG")
numcores <- future::availableCores()
cl <- parallel::makeCluster(numcores)
parallel:clusterEvalQ(cl, library(ltmle))
parallel:clusterEvalQ(cl, Avar <- c("var4", "var5", "var6")
parallel:clusterEvalQ(cl, Lvar <- c("var1", "var2", "var3")
parallel:clusterEvalQ(cl, Yvar <- c("var7", "var8")
parallel:clusterEvalQ(cl, wt <- c("weight")

list.of.imputed.dfs <- parallel:parLapply(cl = cl, list.of.imputed.dfs, function(x) {
Anodes = Avar,
Lnodes = Lvar,
Ynodes = Yvar,
survivaloutcome = T,
observation.weights = wt,
variance.method = "ic",
abar = list(c(1,1,1,1), c(0,0,0,0)))})

This code runs when observation.weights = wt, is commented out, but gives me an error when I leave it as is written above. The error reads:

Error in checkForRemoteErrors(val) :
4 nodes produced errors: first error: observation.weights must be NULL or a vector of length nrow(data) with no NAs, no negative values, and at least one positive value

I checked all of these characteristics for my weight variable in all of my imputed dataframes (it's the same across all of them because in the original dataframe there were no missing weights). I have no NAs and all values are positive. The class for my weight variable is vector and I have as many values as the size of each dataframe in my list.of.imputed.dfs. Additionally, the weight variable is the first variable of each dataframe.

Is there some reason that the way I wrote my code above is not working besides the information in the error message? Or am I missing something key from the error message?


Solution

  • Remove the weight variable from variables to be included in each imputed dataframe.

    You need to export your weight variable to the cluster cores then add the weight variable as a list to cluster cores like this:

    clusterExport(cl, list("data"), envir=environment())
    clusterEvalQ(cl, wt <- data$weight)
    

    Do not do this:

    parallel:clusterEvalQ(cl, wt <- c("weight")
    

    Then run ltmle as above.