This question might be based on my ignorance on how date-times work, but I struggle with timezone transformations of some logger data. I have multiple loggers which were read out at multiple time points. Unwantedly, the timezone setting was sometimes changed during readouts, so that some periods for some loggers are recorded in GMT+1 and some in GMT+2 (but constant time, no switching for daylight saving time or similar). I would like to have them all at UTC+1 (i.e. GMT+1) so that they're comparable. I created a dataframe with all the measurements (temp) of all the loggers (loggerID) over the whole time period (date_time, as character string at the moment). I added a column specifying the time zone each measurement was recorded in (timezone, either "GMT+1" or "GMT+2"). My first try was to create a POSIXct date_time and conditionally telling the function to either use GMT+1 or GMT+2:
dat_temp <- dat_temp %>%
mutate(date_time_set = if_else(timezone == "GMT+1", as.POSIXct(date_time, format = "%Y-%m-%d %H:%M:%S", tz = "GMT+1"), as.POSIXct(date_time, format = "%Y-%m-%d %H:%M:%S", tz = "GMT+2")))
I found out that this doesn't work, as timezones have to be specified as places ("Europe/Paris" for example) or simply "UTC" or "GMT", but without set offs. I can't specify a place in Europe, as this will assume the loggers switched time for daylight saving time, right? I tried the same with lubridate, but same problem:
dat_temp <- dat_temp %>%
mutate(date_time_set = if_else(timezone == "GMT+1", ymd_hms(dat_temp$date_time, tz = "GMT+1"), ymd_hms(dat_temp$date_time, tz = "GMT+2")))
Then I though I could just specify the time as UTC (which is basically wrong) and then manually substract 1h from the GMT+2 data, which would then set everything to UTC+1 (although it would be saved as being in UTC, which would still be wrong, as it is now in UTC+1):
dat_temp <- dat_temp %>%
mutate(date_time_posix = as.POSIXct(date_time, format = "%Y-%m-%d %H:%M:%S", tz = "UTC"), # assumes UTC for everything if not specified otherwise
date_time_corr = if_else(timezone == "GMT+2", date_time_posix - (1*60*60), date_time_posix)) # manually subtract 1h for GMT+2 data
I also played around with with_tz()
and force_tz()
from lubridate, but to no avail. So my questions are:
Subet of data:
> dput(dat_temp)
structure(list(date_time = c("2021-07-01 00:00:00", "2021-07-01 00:30:00",
"2021-07-01 01:00:00", "2021-07-01 01:30:00", "2021-07-01 02:00:00",
"2021-07-01 02:30:00", "2021-07-01 03:00:00", "2021-07-01 03:30:00",
"2021-07-01 04:00:00", "2021-07-01 04:30:00", "2021-10-16 02:30:00",
"2021-10-16 03:00:00", "2021-10-16 03:30:00", "2021-10-16 04:00:00",
"2021-10-16 04:30:00", "2021-10-16 05:00:00", "2021-10-16 05:30:00",
"2021-10-16 06:00:00", "2021-10-16 06:30:00", "2021-10-16 07:00:00",
"2021-10-16 07:30:00", "2021-11-03 00:00:00", "2021-11-03 00:30:00",
"2021-11-03 01:00:00", "2021-11-03 01:30:00", "2021-11-03 02:00:00",
"2021-11-03 02:30:00", "2021-11-03 03:00:00", "2021-11-03 03:30:00",
"2021-11-03 04:00:00", "2021-11-03 04:30:00", "2021-11-03 05:00:00",
"2021-11-19 11:00:00", "2021-11-19 11:30:00", "2021-11-19 12:00:00",
"2021-11-19 12:30:00", "2021-11-19 13:00:00", "2021-11-19 13:30:00",
"2021-11-19 14:00:00", "2021-11-19 14:30:00", "2021-11-19 15:00:00",
"2021-11-19 15:30:00", "2021-11-19 16:00:00"), temp = c(16.427,
16.141, 15.951, 15.569, 15.282, 14.996, 14.9, 14.709, 14.517,
14.421, 4.727, 4.623, 4.519, 4.415, 4.311, 4.207, 4.102, 3.998,
3.893, 3.788, 3.683, 2.73, 2.624, 2.624, 2.624, 2.517, 2.517,
2.517, 2.517, 2.624, 2.73, 2.837, 0.674, 0.674, 0.784, 1.112,
1.872, 2.517, 2.943, 3.155, 3.155, 3.049, 2.73), loggerID = c("logger1",
"logger1", "logger1", "logger1", "logger1", "logger1", "logger1",
"logger1", "logger1", "logger1", "logger2", "logger2", "logger2",
"logger2", "logger2", "logger2", "logger2", "logger2", "logger2",
"logger2", "logger2", "logger1", "logger1", "logger1", "logger1",
"logger1", "logger1", "logger1", "logger1", "logger1", "logger1",
"logger1", "logger3", "logger3", "logger3", "logger3", "logger3",
"logger3", "logger3", "logger3", "logger3", "logger3", "logger3"
), timezone = c("GMT+2", "GMT+2", "GMT+2", "GMT+2", "GMT+2",
"GMT+2", "GMT+2", "GMT+2", "GMT+2", "GMT+2", "GMT+2", "GMT+2",
"GMT+2", "GMT+2", "GMT+2", "GMT+2", "GMT+2", "GMT+2", "GMT+2",
"GMT+2", "GMT+2", "GMT+1", "GMT+1", "GMT+1", "GMT+1", "GMT+1",
"GMT+1", "GMT+1", "GMT+1", "GMT+1", "GMT+1", "GMT+1", "GMT+1",
"GMT+1", "GMT+1", "GMT+1", "GMT+1", "GMT+1", "GMT+1", "GMT+1",
"GMT+1", "GMT+1", "GMT+1")), row.names = c(1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 142902L, 142903L, 142904L, 142905L, 142906L,
142907L, 142908L, 142909L, 142910L, 142911L, 142912L, 196225L,
196226L, 196227L, 196228L, 196229L, 196230L, 196231L, 196232L,
196233L, 196234L, 196235L, 533387L, 533388L, 533389L, 533390L,
533391L, 533392L, 533393L, 533394L, 533395L, 533396L, 533397L
), class = "data.frame")
The solution should be lubridate with_tz(for display purpose), and force_tz(for change purpose).
for your case,
Here is a example code for you reference.
library(lubridate)
library(dplyr)
# get your R system environment timezone & locale
Sys.timezone() # get system timezone
Sys.get_timezone() # get system date time display format
# get datetime with default ymd_hms , the display format is based on locale
t_default <- ymd_hms('2021-07-01 00:00:00')
# get datetime with default ymd_hms and change the display format
t_withtz <- ymd_hms('2021-07-01 00:00:00') |> with_tz('Etc/GMT-1')
# get datetime with default ymd_hms and change the timezone value.
t_forcetz <- ymd_hms('2021-07-01 00:00:00') |> force_tz('Etc/GMT-1')
# check the differnece
## no difference. As there is only format change between t_default and t_withtz
t_default - t_withtz
## time zone difference.
t_default - t_forcetz
The R for data science has good document to explain the detail. the url is :https://r4ds.had.co.nz/dates-and-times.html#time-zones