rsimulationmissing-datar-mice

Creating 5 complete data sets from one incomplete data set in a simulation study [mice package in R]


For a study, I need to generate five complete data sets for each of the 100 incomplete data sets with the help of mice package in R.

This code is working correctly (when you have df1 dataset): df1_imp <- mice(df1, m = 5, method = 'logreg', print = F) Then, we can access the full data sets (5) produced as follows:

dataset1 <- complete(df1_imp, 1)
dataset2 <- complete(df1_imp, 2)
dataset3 <- complete(df1_imp, 3)
dataset4 <- complete(df1_imp, 4)
dataset5 <- complete(df1_imp, 5)

Fine. However, I have 100 incomplete data sets. Each will yield 5 complete data sets (500 in total). How can I view these 500 data sets? Because I'm going to analyze them.

[dfs] MY DATASET LIST (each set must produce 5 complete datasets, 3x5 = 15)

list(structure(c(1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 
0, 1, 1, 0, 1, NA, 1, NA, 0, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 
1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, NA, 
NA, 0, 1, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 0, 1, 
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, NA, 1, 0, 1, NA, 
1, 0, 0, 0, 1, 1, 0), dim = 6:5))

Solution

  • In complete, select action='all' and include=FALSE to exclude the un-imputed dataset. For simulation studies you may want to specify a seed.

    > library(mice)
    > seed. <- 42
    > lapply(raw_data, mice, m=5, method='pmm', seed=seed., printFlag=FALSE) |> 
    +   lapply(complete, action='all', include=FALSE)
    [[1]]
    $`1`
      V1 V2 V3 V4 V5
    1  1  1  0  0  0
    2  0  0  0  1  1
    3  0  1  1  0  1
    4  1  1  0  1  1
    5  0  0  1  1  1
    6  0  0  1  0  1
    
    $`2`
      V1 V2 V3 V4 V5
    1  1  1  0  0  0
    2  0  0  0  1  1
    3  0  1  1  0  1
    4  1  1  0  1  1
    5  0  0  1  1  1
    6  0  0  1  0  1
    
    $`3`
      V1 V2 V3 V4 V5
    1  1  1  0  0  0
    2  0  0  0  1  1
    3  0  1  1  0  1
    4  1  1  0  1  1
    5  0  0  1  1  1
    6  0  0  1  0  1
    
    $`4`
      V1 V2 V3 V4 V5
    1  1  1  0  0  0
    2  0  0  0  1  1
    3  0  1  1  0  1
    4  1  1  0  1  1
    5  0  0  1  1  1
    6  0  0  1  0  1
    
    $`5`
      V1 V2 V3 V4 V5
    1  1  1  0  0  0
    2  0  0  0  1  1
    3  0  1  1  0  1
    4  1  1  0  1  1
    5  0  0  1  0  1
    6  0  0  1  0  1
    
    attr(,"class")
    [1] "mild" "list"
    
    [[2]]
    $`1`
      V1 V2 V3 V4 V5
    1  1  0  0  1  0
    2  1  0  0  0  1
    3  0  0  1  1  1
    4  1  0  1  0  1
    5  1  1  0  0  1
    6  0  0  1  1  1
    
    $`2`
      V1 V2 V3 V4 V5
    1  1  0  0  1  0
    2  1  0  0  0  1
    3  0  0  1  1  1
    4  1  0  1  0  1
    5  1  1  0  0  1
    6  0  0  1  1  1
    
    $`3`
      V1 V2 V3 V4 V5
    1  1  0  0  1  0
    2  1  0  0  0  1
    3  0  0  1  1  1
    4  1  0  1  0  1
    5  1  1  0  0  1
    6  0  0  1  1  1
    
    $`4`
      V1 V2 V3 V4 V5
    1  1  0  0  1  0
    2  1  0  0  0  1
    3  0  0  1  1  1
    4  1  0  1  0  1
    5  1  1  0  0  1
    6  0  0  1  1  1
    
    $`5`
      V1 V2 V3 V4 V5
    1  1  0  0  1  0
    2  1  0  0  0  1
    3  0  0  1  1  1
    4  1  0  1  1  1
    5  1  1  0  0  1
    6  0  0  1  1  1
    
    attr(,"class")
    [1] "mild" "list"
    
    [[3]]
    $`1`
      V1 V2 V3 V4 V5
    1  1  1  0 NA  0
    2  0  0  0  1  0
    3  1  1  1  0  0
    4  0  0  1  1  1
    5  0  0  1 NA  1
    6  0  0  0  1  0
    
    $`2`
      V1 V2 V3 V4 V5
    1  1  1  0 NA  0
    2  0  0  0  1  0
    3  1  1  1  0  0
    4  0  0  1  1  1
    5  0  0  1 NA  1
    6  0  0  0  1  0
    
    $`3`
      V1 V2 V3 V4 V5
    1  1  1  0 NA  0
    2  0  0  0  1  0
    3  1  1  1  0  0
    4  0  0  1  1  1
    5  0  0  1 NA  1
    6  0  0  0  1  0
    
    $`4`
      V1 V2 V3 V4 V5
    1  1  1  0 NA  0
    2  0  0  0  1  0
    3  1  1  1  0  0
    4  0  0  1  1  1
    5  0  0  1 NA  1
    6  0  0  0  1  0
    
    $`5`
      V1 V2 V3 V4 V5
    1  1  1  0 NA  0
    2  0  0  0  1  0
    3  1  1  1  0  0
    4  0  0  1  1  1
    5  0  0  1 NA  1
    6  0  0  0  1  0
    
    attr(,"class")
    [1] "mild" "list"
    
    Warning messages:
    1: Number of logged events: 30 
    2: Number of logged events: 30 
    3: Number of logged events: 2 
    

    Notes

    1. For a serious simulation study, you probably need to set m= somewhat higher, see an earlier answer.
    2. In your example, imputation of the third dataset fails due to collinearities. You can investigate by setting printFlag=TRUE and not piping into complete.

    Data:

    > dput(raw_data)
    list(structure(c(1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 
    0, 1, 1, 0, 1, NA, 1, NA, 0, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 
    1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, NA, 
    NA, 0, 1, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 0, 1, 
    0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, NA, 1, 0, 1, NA, 
    1, 0, 0, 0, 1, 1, 0), dim = 6:5))