rmissing-datanlp-question-answeringlocf

LOCF imputation and how to fill missing entries


I'm working on the following dataset and I'm trying to fill the missing entries of the VISUAL52 variables, imputing data by LOCF method (Last Observation Carried Forward).

library(readr)
library(mice)
library(finalfit)
library(Hmisc)
library(lattice)
library(VIM)
library(rms)
library(zoo)

> hw3
# A tibble: 240 x 11
   treat LINE0 LOST4 LOST12 LOST24 LOST52 VISUAL0 VISUAL4 VISUAL12 VISUAL24 VISUAL52
   <fct> <dbl> <dbl>  <dbl>  <dbl>  <dbl>   <dbl>   <dbl>    <dbl>    <dbl>    <dbl>
 1 2        12     1      3     NA     NA      59      55       45       NA       NA
 2 2        13    -1      0      0      2      65      70       65       65       55
 3 1         8     0      1      6     NA      40      40       37       17       NA
 4 1        13     0      0      0      0      67      64       64       64       68
 5 2        14    NA     NA     NA     NA      70      NA       NA       NA       NA
 6 2        12     2      2      2      4      59      53       52       53       42
 7 1        13     0     -2     -1      0      64      68       74       72       65
 8 1         8     1      0      1      1      39      37       43       37       37
 9 2        12     1      2      1      1      59      58       49       54       58
10 1        10     0     -4     -4     NA      49      51       71       71       NA
# ... with 230 more rows

I don't know whether I've done it good or not, but I've tried to describe the sample size, mean and the standard error for the VISUAL52 variable per treatment in this way (just let me know whether I would have been better to use a different function).

numSummary(hw3[,"VISUAL52", drop=FALSE], groups=hw3$treat, 
           statistics=c("mean", "se(mean)", "quantiles"), 
           quantiles=c(0,.25,.5,.75,1))

binnedCounts(hw3[hw3$treat == '1', "VISUAL52", drop=FALSE])
# treat = 1

binnedCounts(hw3[hw3$treat == '2', "VISUAL52", drop=FALSE])
# treat = 2

However, as to the imputation part, I've run the function nafill() from the data-table package, but I get back the error you may see aftyer ruuning the complete() function.

 library(data.table)
 imp_locf <-  nafill(hw3$VISUAL52, "locf", nan=NA)
 data_imputed <- complete(imp_locf)

*emphasized text*Error in UseMethod("complete_") : 
      no applicable method for 'complete_' applied to an object of class "c('double', 'numeric')"

I'm wondering why the function turn back this error and whether someone may know some alternative methods to impute data with locf method and fill the missing data in dataset.


Solution

  • If you want to apply locf on your dataset, you can use the imputeTS package.

     library(imputeTS)
     hw3 <- na_locf(hw3)
     hw3
    

    or if you just want to use LOCF for the VISUAL52 variable:

     library(imputeTS)
     hw3$VISUAL52 <- na_locf(hw3$VISUAL52)
     hw3
    

    Also keep in mind other algorithms might be even better suited for your data. imputeTS offers multiple functions especially for time series imputation (more algorithms in imputeTS). The mice package you already seem to use has additional algorithms for cross-sectional data.