I am looking to populate missing values in my table with non-NA value of the closest date, be it before or after the reference date. This means that a table like:
date value
03.03.2023 1
04.03.2023 NA
06.03.2023 4
09.03.2023 NA
10.03.2023 3
Would be filled as:
date value
03.03.2023 1
04.03.2023 1
06.03.2023 4
09.03.2023 3
11.03.2023 3
Explanation: Since 03.03. is closer to 04.03., locf is used. Yet since 11.03. is closer to 09.03., nocb (locf, fromLast = T) is used.
Eventual conflicts could take place if a NA value is flanked by two values of equal distance to the reference date. In this case, I would like locf to be preferred.
The code I have at the moment uses the stiff "locf" twice (once as standard and once as fromLast) and is not as flexible:
read.csv("path/to/merged_data.csv",
colClasses = c("Date", "numeric", "numeric", "numeric", "character")) %>%
group_by(field_id) %>%
arrange(date) %>%
mutate(
Nearest_l8_locf = ifelse(!is.na(NDVI_l7) & is.na(NDVI_l8), na.locf(NDVI_l8), NDVI_l8),
Nearest_s2_locf = ifelse(!is.na(NDVI_l7) & is.na(NDVI_s2), na.locf(NDVI_s2), NDVI_s2),
Nearest_l8_locb = ifelse(!is.na(NDVI_l7) & is.na(NDVI_l8), na.locf(NDVI_l8, fromLast = TRUE), NDVI_l8),
Nearest_s2_locb = ifelse(!is.na(NDVI_l7) & is.na(NDVI_s2), na.locf(NDVI_s2, fromLast = TRUE), NDVI_s2)
) %>%
filter(!is.na(NDVI_l7)) %>%
select(-NDVI_l8, -NDVI_s2) %>%
relocate(field_id, .after = last_col()) %>%
write_csv(file.path(results, "merged_data_interpolated.csv"))
In my actual case, the reference date are all dates for which a column (NDVI_l7) is not NA and the procedure to populate NA is done for two other columns (NDVI_l8 and NDVI_s2). It is also grouped by the column "field_id" since dates are repeated for each of those ID.
How can I adapt the code so that NA values are populated with the values of the closest date, regardless of where it is in the column?
another variant with base R only:
d
being your example data:
d <- structure(list(date = structure(c(19419, 19420, 19422, 19425,
19426), class = "Date"), value = c(1L, NA, 4L, NA, 3L)), row.names = c(NA,
5L), class = "data.frame")
convert column date to class Date
:
d$date <- as.Date(d$date, '%d.%m.%Y')
exploit the dist
ance function to find closest neighbour:
impute_from_neighbours <- function(values, dates){
dists <- dist(dates) |> as.matrix()
dists[dists == 0] <- NA
na_pos <- which(is.na(values))
closest_non_na_pos <- apply(dists[, na_pos], 2, which.min)
values[na_pos] <- values[closest_non_na_pos]
values
}
d$value <- impute_from_neighbours(d$value, d$date)
output:
> d
date value
1 2023-03-03 1
2 2023-03-04 1
3 2023-03-06 4
4 2023-03-09 3
5 2023-03-10 3