I need to get the list of temperatures from a weather station from 9am to 9am the next day in R and set their dates to the second day. A simplified version of the data looks like
DateTime | Temp |
---|---|
2024-05-06 08:23:01 | 12 |
2024-05-06 09:27:01 | 10 |
2024-05-06 12:53:31 | 5 |
2024-05-07 01:28:16 | 30 |
2024-05-07 05:07:59 | 30 |
2024-05-07 09:36:48 | 9 |
I would like to produce a dataframe of only the 9am:9am values so that
DateTime | Temp |
---|---|
2024-05-07 | 10 |
2024-05-07 | 5 |
2024-05-07 | 30 |
2024-05-07 | 30 |
My function to do this is
subnine=function(esheet){
mydates<-unique(as.Date(esheet$DateTime))
sheet9<-esheet[0,]
esheet$DateTime<-as.POSIXlt(esheet$DateTime)
for (l in 2:length(mydates))
{
dateslice<-subset(esheet,(as.Date(esheet$DateTime)==mydates[l-1]&esheet$DateTime$hour>=9)|(as.Date(esheet$DateTime)==mydates[l] & esheet$Datetime$hour<9))
dateslice$DateTime<-mydates[l]+1
sheet9<-rbind(sheet9,dateslice)
}
return(sheet9)
}
It is returning only the pre 9am values for the second day, so
DateTime | Temp |
---|---|
2024-05-07 | 30 |
2024-05-07 | 30 |
What is preventing the first part of the or(|) statement in my subset from selecting the first day's post 9am values? The logical statement works outside of the function/loop.
First, make sure that DateTime
is formatted appropriately to "POSIXt"
format. We can do this using as.POSIXct
where we set our time zone. Best, we save the time zone in an string, we will use it two times.
> my_tz <- "CET" ## adjust this to your requirements!
> df$DateTime <- as.POSIXct(as.character(df$DateTime), tz=my_tz) ##*
I use as.character
here to prevent an accidentally entered time zone from changing the times.
Next, we generate the first changeover to 0900 AM,
> (chg1 <- as.POSIXct(paste(as.Date(df$DateTime)[1], "09:00:00"), tz=my_tz))
[1] "2024-05-06 09:00:00 CEST"
which we will use to create a seq
uence of changeovers.
> (dsq <- seq.POSIXt(chg1 - 60^2*24, by='1 day',
+ length.out=length(unique(as.Date(df$DateTime))) + 2L))
[1] "2024-05-05 09:00:00 CEST" "2024-05-06 09:00:00 CEST" "2024-05-07 09:00:00 CEST"
[4] "2024-05-08 09:00:00 CEST" "2024-05-09 09:00:00 CEST" "2024-05-10 09:00:00 CEST"
[7] "2024-05-11 09:00:00 CEST" "2024-05-12 09:00:00 CEST" "2024-05-13 09:00:00 CEST"
[10] "2024-05-14 09:00:00 CEST"
"POSIXt"
format stores the time internally as seconds since "1970-01-01 00:00:00"
aka "unix epoch". So if we subtract 60^2*24
, the number of seconds of a day, we get a day earlier. cut.POSIXt
which we will use in the final step needs a lower and an upper bound, so we add one element at each end.
Finally we use the changeovers as breaks=
to cut
the DateTimes into Dates.
> transform(df, date=as.Date(as.character(cut.POSIXt(DateTime, breaks=dsq))))
DateTime Temp date
1 2024-05-06 08:23:01 12 2024-05-05
2 2024-05-06 09:27:01 10 2024-05-06
3 2024-05-06 12:53:31 5 2024-05-06
4 2024-05-07 01:28:16 30 2024-05-06
5 2024-05-07 05:07:59 30 2024-05-06
6 2024-05-07 09:36:48 9 2024-05-07
7 2024-05-08 08:23:01 11 2024-05-07
8 2024-05-08 10:27:01 12 2024-05-08
9 2024-05-08 12:53:31 6 2024-05-08
10 2024-05-09 01:28:16 31 2024-05-08
11 2024-05-09 05:09:59 32 2024-05-08
12 2024-05-09 11:36:48 10 2024-05-09
13 2024-05-10 08:23:01 11 2024-05-09
14 2024-05-10 10:27:01 12 2024-05-10
15 2024-05-10 12:53:31 6 2024-05-10
16 2024-05-12 01:28:16 31 2024-05-11
17 2024-05-12 05:09:59 32 2024-05-11
18 2024-05-12 09:36:48 10 2024-05-12
19 2024-05-13 01:28:16 10 2024-05-12
20 2024-05-13 05:09:59 11 2024-05-12
21 2024-05-13 11:36:48 12 2024-05-13
Data
> dput(df)
structure(list(DateTime = c("2024-05-06 08:23:01", "2024-05-06 09:27:01",
"2024-05-06 12:53:31", "2024-05-07 01:28:16", "2024-05-07 05:07:59",
"2024-05-07 09:36:48", "2024-05-08 08:23:01", "2024-05-08 10:27:01",
"2024-05-08 12:53:31", "2024-05-09 01:28:16", "2024-05-09 05:09:59",
"2024-05-09 11:36:48", "2024-05-10 08:23:01", "2024-05-10 10:27:01",
"2024-05-10 12:53:31", "2024-05-12 01:28:16", "2024-05-12 05:09:59",
"2024-05-12 09:36:48", "2024-05-13 01:28:16", "2024-05-13 05:09:59",
"2024-05-13 11:36:48"), Temp = c(12L, 10L, 5L, 30L, 30L, 9L,
11L, 12L, 6L, 31L, 32L, 10L, 11L, 12L, 6L, 31L, 32L, 10L, 10L,
11L, 12L)), class = "data.frame", row.names = c(NA, -21L))