rdatetimedplyr

Calculate the mean for date and hour from a multi year dataframe in r


I have a data frame that looks like this:

structure(list(datetime = structure(c(1559347200, 1559350800, 
1559354400, 1559358000, 1559361600, 1559365200, 1559368800, 1559372400, 
1559376000, 1559379600), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), BUSwt = c(77.8154, 77.2538, 76.9325, 76.81235, 76.6544, 
76.3403, 76.16165, 76.22375, 76.47035, 76.74395), MBwt = c(78.3653, 
78.53495, 78.6074, 78.30725, 78.0368, 77.9477, 77.8946, 77.927, 
78.12455, 78.23615), SUSwt = c(76.2575, 76.0055, 75.7265, 75.425, 
75.137, 74.849, 74.651, 74.624, 74.696, 74.8535)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

It has 5 years of datetime data for the three variable columns "BUSwt", "MBwt", and "SUSwt". I want to calculate the mean, min, and max of of these variable for each specific date and hour over the course of the 5 years of data. An example of this data would look like:

datetime BUSwt_mean BUSwt_max BUSwt_min MBwt_mean MBwt_max MBwt_min SUSwt_mean SUSwt_max SUSwt_min
06-01 00:00:00 xx xx xx xx xx xx xx xx xx
06-01 01:00:00 xx xx xx xx xx xx xx xx xx
06-01 02:00:00 xx xx xx xx xx xx xx xx xx
06-01 03:00:00 xx xx xx xx xx xx xx xx xx
06-01 04:00:00 xx xx xx xx xx xx xx xx xx

My current code to make the data frame looks like this:

WaterData <- BUSdata %>%
  left_join(MBdata, by = "datetime") %>%
  left_join(SUSdata, by = "datetime")

Is there a way to do this continuing in a pipe?


Solution

  • Another way to get what you need, using dplyr's handy across() function:

    library(tidyverse)
    df %>%
      mutate(month = month(datetime),
             day = day(datetime),
             hour = hour(datetime)) %>%
      group_by(month, day, hour) %>%
      summarise(across(.cols = c("BUSwt", "MBwt", "SUSwt"),
                       .fns = list("min" = min, "mean" = mean, "max" = max))) %>%
      ungroup()
    

    This solution separates month, day, and hour into their own columns. If you really need a date format (e.g., for plotting purposes), then you can coerce everything to the same year (doesn't matter what year it is), like this:

    df %>%
      mutate(datetime = `year<-`(datetime, 2000)) %>%
      group_by(datetime) %>%
      summarise(across(.cols = c("BUSwt", "MBwt", "SUSwt"),
                       .fns = list("min" = min, "mean" = mean, "max" = max))) %>%
      ungroup()