rgroup-byinterpolationnaimputets

R Interpolate values by group


I have a dataframe with the European states where each state occurs 10 times (for 10 days). I want to interpolate the NA values of multiple columns, which I could achieve using

library("imputeTS")
na_interpolation(dataframe)

But I want to interpolate all NA values by state. How can that be done? I have already tried a lot of different solutions, but none did work for me.

As pseudo-code I would like to have something like

na_interpolation(dataframe, groupby=state)

Anything that could work?

These code samples did unfortunaetly not work for me

interpolation <- dataframe %>% 
  group_by(state-name) %>% 
  na_interpolation(dataframe)

Solution

  • You could use the split-apply-bind method:

    do.call(rbind, lapply(split(dataframe, dataframe$state), na_interpolation))
    

    As a worked example, take the following dummy data:

    set.seed(3)
    
    dataframe <- data.frame(state = rep(c("A", "B", "C"), each = 5),
                            value = rnorm(15))
    
    dataframe$value[sample(15, 4)] <- NA
    
    dataframe
    #>    state       value
    #> 1      A -0.96193342
    #> 2      A          NA
    #> 3      A  0.25878822
    #> 4      A -1.15213189
    #> 5      A  0.19578283
    #> 6      B  0.03012394
    #> 7      B  0.08541773
    #> 8      B          NA
    #> 9      B          NA
    #> 10     B  1.26736872
    #> 11     C -0.74478160
    #> 12     C          NA
    #> 13     C -0.71635849
    #> 14     C  0.25265237
    #> 15     C  0.15204571
    

    Then we can do:

    library(imputeTS)
    
    do.call(rbind, lapply(split(dataframe, dataframe$state), na_interpolation))
    #>      state       value
    #> A.1      A -0.96193342
    #> A.2      A -0.35157260
    #> A.3      A  0.25878822
    #> A.4      A -1.15213189
    #> A.5      A  0.19578283
    #> B.6      B  0.03012394
    #> B.7      B  0.08541773
    #> B.8      B  0.47940140
    #> B.9      B  0.87338506
    #> B.10     B  1.26736872
    #> C.11     C -0.74478160
    #> C.12     C -0.73057004
    #> C.13     C -0.71635849
    #> C.14     C  0.25265237
    #> C.15     C  0.15204571
    

    Created on 2020-12-12 by the reprex package (v0.3.0)