rimputationpanel-datar-micemultilevel-analysis

Multiple imputation for longitudinal data with 2l.pan or panImpute (mice package in R)


I have a longitudinal (panel) data frame called tradep_red in long format that contains 200 countries (country), 26 years (year), the continuous dependent variable gini and 2 continuous predictor variables (trade and unempl, in reality there are 13 but I reduced it to 2 for the sake of this question). Both gini and the predictor variables contain missing values. Dummy data is shown below:

# Generate dummy data 
set.seed(12345)
country <- as.factor(rep(1:200, each = 26))
year <- rep(1:26, times = 200)
gini <- rnorm(n = 200*26, mean = 20, sd = 4)
trade <- rnorm(n = 200*26, mean = 1000, sd = 7)
unempl <- rnorm(n = 200*26, mean = 4, sd = 0.2)

# Add NA values 
missing_indices_gini <- sample(1:length(gini), 1000)
gini[missing_indices_gini] <- NA
missing_indices_trade <- sample(1:length(trade), 800)
trade[missing_indices_trade] <- NA
missing_indices_unempl <- sample(1:length(unempl), 900)
unempl[missing_indices_unempl] <- NA

# Combine into dataframe
tradep_red <- data.frame(country, year, gini, trade, unempl)
head(tradep_red)
##   country year     gini     trade   unempl
## 1       1    1 22.34212 1006.3982 3.740346
## 2       1    2 22.83786  997.7583 3.801918
## 3       1    3 19.56279  996.9160 3.699202
## 4       1    4       NA        NA 3.838534
## 5       1    5 22.42355  996.0563 3.835563
## 6       1    6       NA 1005.5007 4.115319

I want to multiple impute the missing values in the data while specifically accounting for the multilevel structure in the data (i.e. clustering by country). With the code below (using the mice package), I have been able to create imputed data sets with the pmm method.

library(mice)

# Multiple imputation
predictorMatrix <- quickpred(tradep_red, 
                             include = c("country", "gini", "trade", "unempl"), 
                             exclude = c("year"), mincor = 0.1)

imp <- mice(data = tradep_red, 
            m = 3, 
            maxit = 5, 
            method = "pmm", 
            predictorMatrix = predictorMatrix,  
            seed = 123)

However, I would like to use the 2l.pan method (or another method such as panImpute) to account for the cluster variable country. The 2l.pan method requires a cluster variable to be specified in the predictorMatrix by giving country a value of -2, and then running the imputation:

predictorMatrix["country", ] <- -2 # specify country as cluster variable

imp <- mice(data = tradep_red, 
            m = 3, 
            maxit = 5, 
            method = "2l.pan", 
            predictorMatrix = predictorMatrix,  
            seed = 123)

This however gives the error:

## iter imp variable
##  1   1  giniError in mice.impute.2l.pan(y = c(22.3421152713754, 22.8378640700381,  : 
##  No class variable

Alternatively, the cluster variable can be specified in a formula statement with the | operator. Moreover, the formula statement is required to be a list. I have not succeeded in correctly specifying this formula statement. The code below shows what I have tried:

formula_imp <- list(gini + trade + unempl ~ (1 | country))

imp <- mice(data = tradep_red, 
            m = 3, 
            maxit = 5, 
            method = "2l.pan", 
            predictorMatrix = predictorMatrix, 
            formulas = formula_imp, 
            seed = 123)

This gives the error:

## iter imp variable
##  1   1  gini trade unempl  giniError in mice.impute.2l.pan(y = c(22.3421152713754, 22.8378640700381,  : 
##  No class variable
## In addition: Warning messages:
## 1: In Ops.factor(1, country) : ‘|’ not meaningful for factors
## 2: In Ops.factor(1, country) : ‘|’ not meaningful for factors
## 3: In Ops.factor(1, country) : ‘|’ not meaningful for factors

I get similar errors when trying to use the alternative panImpute method in the mice function. How can I correctly specify country to be the cluster variable for the multiple imputation process? Any help or references are greatly appreciated!


Solution

  • The class variable needs to be integer. Thus add the following and your first attempt with the predictorMatrix will work

    tradep_red = tradep_red %>% mutate(country = country %>% as.integer() )