I want to use zoo::na.approx
(but not married to this function) to fill in a response variable for the missing days in my dataframe. I'm having a tough time figuring out how to add the NAs to the original dataframe so that na.approx can fill them in.
My dataframe looks something like this:
df<-data.frame(trt=c("A", "A", "A", "A", "B", "B", "B", "B"),
day = c(1,3,7,9,1,5,8,9),
value = c(7,12,5,7,5,6,11,8),
stringsAsFactors=FALSE)
I want every day to be in the dataframe with "NA" for each day where I don't have data.
I have used something like this to expand my dataset:
library(dplyr)
days_possible <- expand.grid(
day = seq(from=min(df$day), max(df$day), by=1),
trt = c("A", "B"),
stringsAsFactors = FALSE
)
new_df<- df %>%
right_join(days_possible, by = c("trt", "day"))
My problem is that I have a bunch of sites, years, and a few treatment columns, so somewhere it seems to all get messed up and in my days_possible
dataframe, I don't get it right.
Is there a function to avoid this mess, expand one column, and have all the other columns expand in a tidy fashion? I'm looking at modelr::data_grid
, but by itself I am not sure how to get the final desired result - an ordered dataframe I can group by treatments and use approximation to fill in missing days.
We can use the complete
and full_seq
functions from the tidyr
package. The final as.data.frame()
is not required. I just added it to print the output as a data frame.
library(tidyr)
df2 <- df %>%
complete(trt, day = full_seq(day, period = 1)) %>%
as.data.frame()
df2
# trt day value
# 1 A 1 7
# 2 A 2 NA
# 3 A 3 12
# 4 A 4 NA
# 5 A 5 NA
# 6 A 6 NA
# 7 A 7 5
# 8 A 8 NA
# 9 A 9 7
# 10 B 1 5
# 11 B 2 NA
# 12 B 3 NA
# 13 B 4 NA
# 14 B 5 6
# 15 B 6 NA
# 16 B 7 NA
# 17 B 8 11
# 18 B 9 8