rfilltidyr

Fill a limited number of values - tidyr Fill


I have a data frame and I am using:

df <-  data.frame(dates = seq(as.Date("2016-01-01"),as.Date("2016-01-10"), by=1)
       , category = c(rep("a",5), rep("b",5))
       , values= c(1, rep(NA,4), 5,6, rep(NA,3))) 

df %>% group_by(category) %>% fill(values)

but I would like fill to only carry forward a certain number of places (i.e stop carrying forward if it's too far away from the initial point). Is there a simple way of doing this without a for loop?

In this example I would like to stop filling if the date is >2 days from the last non NA point. So the values column would be

values = c(1,1,1,NA,NA, 5,6,6,6,NA)

Thank you


Solution

  • One way to do it is to first fill(values) and then convert to NA any values that were observed after more than two days from last non-NA point (i.e. max(dates[!is.na(values)])).

    library(dplyr)
    library(tidyr)
    
    df %>% 
      group_by(category) %>% 
      mutate(new_date = max(dates[!is.na(values)]), diff1 = as.numeric(difftime(dates, new_date)/(24*3600))) %>% 
      fill(values) %>% 
      mutate(values = replace(values, which(diff1 > 2), NA)) %>% 
      select(dates:values)
    
    #Source: local data frame [10 x 3]
    #Groups: category [2]
    
    #        dates category values
    #       (date)   (fctr)  (dbl)
    #1  2016-01-01        a      1
    #2  2016-01-02        a      1
    #3  2016-01-03        a      1
    #4  2016-01-04        a     NA
    #5  2016-01-05        a     NA
    #6  2016-01-06        b      5
    #7  2016-01-07        b      6
    #8  2016-01-08        b      6
    #9  2016-01-09        b      6
    #10 2016-01-10        b     NA
    

    Note the difftime was giving me seconds so I manually converted to days