rdataframedplyrdata-manipulationdata-mining

Add rows to the weather data for each day, placing the corresponding date at the top


I have weather data recorded at 15-minute intervals in R, and I need to format the data such that each day's data is followed by two blank rows with the corresponding date written on the top. Here's a glimpse of the required output:

enter image description here

Desired export file format

enter image description here

I have a dataframe (df) with columns date, temperature, precipitation, and wind_speed and looks like this. How can I achieve this formatting in R?

enter image description here

Here is the reproducible example:

df <- structure(list(date = structure(c(1401104700, 1401105600, 1401106500, 
                                        1401107400, 1401108300, 1401148800, 1401149700, 1401150600, 1401151500, 
                                        1401152400, 1401235200, 1401236100, 1401237000, 1401237900, 1401238800
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), temperature = c(25, 
                                                                   25.2, 25.3, 25.1, 25.4, 18.6, 18.3, 18.2, 18.2, 18.2, 19.7, 19.1, 
                                                                   18.7, 18.5, 18.3), precipitation = c(0, 0, 0, 0, 0, 0, 0, 0, 
                                                                                                        0, 0, 0, 0, 0, 0, 0), wind_speed = c(1.006, 1.006, 0.9054, 0.6036, 
                                                                                                                                             0.4024, 0.1006, 0.2012, 0.503, 0.1006, 0, 0, 0.1006, 0.2012, 
                                                                                                                                             0.1006, 0.2012)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                                                                                                                                                                       -15L))

Solution

  • I first create a function that can take the data for one day from your data frame and add the desired rows at the beginning of that block. It also converts the dates to the format you show in your question. I use the function paste_comma() to add commas after all the values.

    paste_comma <- function(x) paste0(x, ",")
    
    add_header <- function(df) {
      day <- format(df$date[1], "%d_%m_%y")
      header <- tibble(
        date = c("", "", day, "Time,"),
        temperature = c("", "", "", "Temp. (f),"),
        precipitation = c("", "", "", "Precipitation,"),
        wind_speed = c("", "", "", "Wind Speed,")
      )
      df <- df %>%
        mutate(date = format(date, format = "%Y%m%d%H%M PST")) %>%
        mutate(across(everything(), paste_comma))
      bind_rows(header, df)
    }
    

    Next, I group the data frame by day and apply the function to each group. I use reframe() because I want to return multiple rows per group.

    df_new <- df %>%
      group_by(day = floor_date(date, "day")) %>%
      reframe(add_header(pick(everything()))) %>%
      select(-day)
    

    When you write the tsv file, make sure to omit the headers:

    write_tsv(df_new, "data.tsv", col_names = FALSE)
    

    This is how the output looks in Excel:

    enter image description here