rdatetimesubsetweather

Subsetting a custom time range in base R


I need to get the list of temperatures from a weather station from 9am to 9am the next day in R and set their dates to the second day. A simplified version of the data looks like

DateTime Temp
2024-05-06 08:23:01 12
2024-05-06 09:27:01 10
2024-05-06 12:53:31 5
2024-05-07 01:28:16 30
2024-05-07 05:07:59 30
2024-05-07 09:36:48 9

I would like to produce a dataframe of only the 9am:9am values so that

DateTime Temp
2024-05-07 10
2024-05-07 5
2024-05-07 30
2024-05-07 30

My function to do this is

subnine=function(esheet){
 mydates<-unique(as.Date(esheet$DateTime))
 sheet9<-esheet[0,]
 esheet$DateTime<-as.POSIXlt(esheet$DateTime)
 for (l in 2:length(mydates))
{ 
   dateslice<-subset(esheet,(as.Date(esheet$DateTime)==mydates[l-1]&esheet$DateTime$hour>=9)|(as.Date(esheet$DateTime)==mydates[l] & esheet$Datetime$hour<9))
   dateslice$DateTime<-mydates[l]+1
  sheet9<-rbind(sheet9,dateslice)
 }
return(sheet9)
}

It is returning only the pre 9am values for the second day, so

DateTime Temp
2024-05-07 30
2024-05-07 30

What is preventing the first part of the or(|) statement in my subset from selecting the first day's post 9am values? The logical statement works outside of the function/loop.


Solution

  • First, make sure that DateTime is formatted appropriately to "POSIXt" format. We can do this using as.POSIXct where we set our time zone. Best, we save the time zone in an string, we will use it two times.

    > my_tz <- "CET"  ## adjust this to your requirements!
    > df$DateTime <- as.POSIXct(as.character(df$DateTime), tz=my_tz)  ##*
    

    I use as.character here to prevent an accidentally entered time zone from changing the times.

    Next, we generate the first changeover to 0900 AM,

    > (chg1 <- as.POSIXct(paste(as.Date(df$DateTime)[1], "09:00:00"), tz=my_tz))
    [1] "2024-05-06 09:00:00 CEST"
    

    which we will use to create a sequence of changeovers.

    > (dsq <- seq.POSIXt(chg1 - 60^2*24, by='1 day', 
    +                   length.out=length(unique(as.Date(df$DateTime))) + 2L))
     [1] "2024-05-05 09:00:00 CEST" "2024-05-06 09:00:00 CEST" "2024-05-07 09:00:00 CEST"
     [4] "2024-05-08 09:00:00 CEST" "2024-05-09 09:00:00 CEST" "2024-05-10 09:00:00 CEST"
     [7] "2024-05-11 09:00:00 CEST" "2024-05-12 09:00:00 CEST" "2024-05-13 09:00:00 CEST"
    [10] "2024-05-14 09:00:00 CEST"
    

    "POSIXt" format stores the time internally as seconds since "1970-01-01 00:00:00" aka "unix epoch". So if we subtract 60^2*24, the number of seconds of a day, we get a day earlier. cut.POSIXt which we will use in the final step needs a lower and an upper bound, so we add one element at each end.

    Finally we use the changeovers as breaks= to cut the DateTimes into Dates.

    > transform(df, date=as.Date(as.character(cut.POSIXt(DateTime, breaks=dsq))))
                  DateTime Temp       date
    1  2024-05-06 08:23:01   12 2024-05-05
    2  2024-05-06 09:27:01   10 2024-05-06
    3  2024-05-06 12:53:31    5 2024-05-06
    4  2024-05-07 01:28:16   30 2024-05-06
    5  2024-05-07 05:07:59   30 2024-05-06
    6  2024-05-07 09:36:48    9 2024-05-07
    7  2024-05-08 08:23:01   11 2024-05-07
    8  2024-05-08 10:27:01   12 2024-05-08
    9  2024-05-08 12:53:31    6 2024-05-08
    10 2024-05-09 01:28:16   31 2024-05-08
    11 2024-05-09 05:09:59   32 2024-05-08
    12 2024-05-09 11:36:48   10 2024-05-09
    13 2024-05-10 08:23:01   11 2024-05-09
    14 2024-05-10 10:27:01   12 2024-05-10
    15 2024-05-10 12:53:31    6 2024-05-10
    16 2024-05-12 01:28:16   31 2024-05-11
    17 2024-05-12 05:09:59   32 2024-05-11
    18 2024-05-12 09:36:48   10 2024-05-12
    19 2024-05-13 01:28:16   10 2024-05-12
    20 2024-05-13 05:09:59   11 2024-05-12
    21 2024-05-13 11:36:48   12 2024-05-13
    

    Data

    > dput(df)
    structure(list(DateTime = c("2024-05-06 08:23:01", "2024-05-06 09:27:01", 
    "2024-05-06 12:53:31", "2024-05-07 01:28:16", "2024-05-07 05:07:59", 
    "2024-05-07 09:36:48", "2024-05-08 08:23:01", "2024-05-08 10:27:01", 
    "2024-05-08 12:53:31", "2024-05-09 01:28:16", "2024-05-09 05:09:59", 
    "2024-05-09 11:36:48", "2024-05-10 08:23:01", "2024-05-10 10:27:01", 
    "2024-05-10 12:53:31", "2024-05-12 01:28:16", "2024-05-12 05:09:59", 
    "2024-05-12 09:36:48", "2024-05-13 01:28:16", "2024-05-13 05:09:59", 
    "2024-05-13 11:36:48"), Temp = c(12L, 10L, 5L, 30L, 30L, 9L, 
    11L, 12L, 6L, 31L, 32L, 10L, 11L, 12L, 6L, 31L, 32L, 10L, 10L, 
    11L, 12L)), class = "data.frame", row.names = c(NA, -21L))