I have data rows with a common date and record number, but there are different columns with data in each record, such as in this data:
my_data <- structure(list(Station_Number = c("0045", "0045", "0045", "0045"),
Station_Name = c("Scottsbluff 2NW", "Scottsbluff 2NW", "Scottsbluff 2NW", "Scottsbluff
2NW"), NWSLI = c("SBMN1", "SBMN1", "SBMN1", "SBMN1"), Station_Legacy_Name =
c("Scottsbluff", "Scottsbluff", "Scottsbluff", "Scottsbluff"), Lat = c(41.89175,
41.89175, 41.89175, 41.89175), Lon = c(-103.6814722, -103.6814722, -103.6814722,
-103.6814722), Elev_m = c("1198.54", "1198.54", "1198.54", "1198.54"), TIMESTAMP =
c("2017-06-18 04:00:00","2017-06-18 04:00:00", "2017-06-18 04:01:00", "2017-06-18 04:01:00"), RECORD = c(214037L, 214037L, 214038L, 214038L), Ta_2m_Avg = c(NA_real_,
NA_real_, NA_real_, NA_real_), TaMax_2m = c(NA_real_, NA_real_, NA_real_, NA_real_),
LithBatt_Min = c(NA_real_, NA_real_, NA_real_, NA_real_), MaintMode = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_), WndSpd5s_3m_Avg = c(1.373, NA, 1.786, NA),
WndMaxSpd5s_3m_Avg = c(NA, 1.373, NA, 1.786), source =
c("E:\\Data_Collected_withDups/Scottsbluff/NE-Scottsbluff-CR1000-Table018-dat",
"E:\\Data_Collected_withDups/Scottsbluff/NE-Scottsbluff-CR1000-Table018-dat.1.backup",
"E:\\Data_Collected_withDups/Scottsbluff/NE-Scottsbluff-CR1000-Table018-dat",
"E:\\Data_Collected_withDups/Scottsbluff/NE-Scottsbluff-CR1000-Table018-dat.1.backup"
)), row.names = c(2989579L, 5089877L, 2989580L, 5089878L), class = "data.frame")
Ignoring the last column, I want to group by TIMESTAMP and fill everything. I've tried various versions of
my_data[,1:15] |> group_by(TIMESTAMP) |> tidyr::fill(everything(), .direction = 'up') |>
head()
Shouldn't this produce a data frame with two rows, with "WndSpd5s_3m_Avg" and "WndMaxSpd5s_3m_Avg" variables filled in? I've tried many versions of this with no luck. I don't know how/if to use the head() part of code in this case. Any help would be great.
Assuming that you want to obtain the (single) non-missing entries of the WndSpd columns per timestamp, you can simply do:
library(tidyverse)
my_data %>%
group_by(TIMESTAMP) %>%
summarize(across(c('WndSpd5s_3m_Avg', 'WndMaxSpd5s_3m_Avg'), na.omit))
#> # A tibble: 2 × 3
#> TIMESTAMP WndSpd5s_3m_Avg WndMaxSpd5s_3m_Avg
#> <chr> <dbl> <dbl>
#> 1 2017-06-18 04:00:00 1.37 1.37
#> 2 2017-06-18 04:01:00 1.79 1.79
Created on 2023-11-13 with reprex v2.0.2
Using fill does not change the number of rows of your data set (just like mutate), summarize instead will end up with one row per group (here number of unique timestamps)