rdataframetime-seriesimputets

Iteratively filling a new column in a for loop in R


I'm working with a large dataset that has multiple locations measured monthly, but each site has different number of measurement and NAs, creating a broken time series. To get around this, I've created a for loop, looped at each site, to fill in the gaps using an interpolation technique. From this, I get an interpolated output and would ideally like to add this back into the original dataset. For example:

library(imputeTS)

Sites = c(rep("A", 5), rep("B", 4), rep("C", 10))
Meas = c(25,20,NA,21,NA,23,21,22,26,27,15,20,NA,25,NA,28,28,27,NA)

df= data.frame(Sites, Meas)

for(i in Sites) {
d = subset(df, Sites = i)
d$fit = na.interpolation(d$Meas)
}

What I would like is to take d$fit and match it back into a new column, df$fit, such that the number of measurements and each site is matched properly. Any suggestions, or complete overhauls to my approach? Thanks in advance!


Solution

  • It's not often that you actually need for loops. You can do this particular task with the ave() function

    df$fit <- ave(df$Meas, df$Sites, FUN=na.interpolation)
    

    In this case the function applies the na.interpolation function to each of the Meas values for each of the different values of Sites and then puts everything back in the right order.

    Another stragegy you could use for something more complex, is split/unsplit. Something like

    ss <- split(df$Meas, df$Sites)
    df$fit <- unsplit(lapply(ss, na.interpolation), df$Sites)