rtimezonegisdate-sunrise

Getting an error when defining timezone for R sunrise function in bioRad package


I have a dataframe of locations and dates across the US for which I want to retrieve the time of sunset. I used tz_lookup_coords in the lutz package to define the timezone based on location, but when I feed this into bioRad sunrise I get an error message that the tz value is invalid.

# create dataframe
data <- data.frame(id = 1:10,
                   date = c("2018-02-05", "2018-12-29", "2018-05-25", "2018-02-19", 
                            "2017-02-09", "2017-10-05", "2018-02-18",
                            "2017-11-27", "2017-10-13", "2018-12-03"),
                   wgs_x = c(-105.12782,  -73.70111, -119.89776, -157.94036,  
                             -85.62744,  -87.73867,  -90.03440 , -97.39539,
                             -112.34498,  -83.06242),
                   wgs_y = c(39.98948, 41.03264, 36.84011, 21.33720, 42.88368,
                             30.42648, 35.20090, 27.68490, 34.62111, 42.39886))
data$date <- as.Date(data$date)

# define timezones
data$timezone <- tz_lookup_coords(data$wgs_y, data$wgs_x, method = "accurate", warn = F)

# define sunrise time
data$sunrise <- sunrise(date = data$date, 
                        lon = data$wgs_x,
                        lat = data$wgs_y,
                        tz = data$timezone)

Solution

  • Lacking knowing what you're seeing, I'll try to reproduce based on what is being done inside bioRad::sunrise ... namely, as.POSIXct(date, tz).

    data$timezone <- lutz::tz_lookup_coords(data$wgs_y, data$wgs_x, method = "accurate", warn = F)
    data$timezone
    #  [1] "America/Denver"      "America/New_York"    "America/Los_Angeles" "Pacific/Honolulu"    "America/Detroit"     "America/Chicago"     "America/Chicago"     "America/Chicago"     "America/Phoenix"     "America/Detroit"    
    

    So far so good. However, based on https://github.com/adokter/bioRad/blob/master/R/sunrise_sunset.R#L129, I'll try

    as.POSIXct(data$date, tz = data$timezone)
    # Error in strptime(xx, f, tz = tz) : invalid 'tz' value
    

    (For future questions, it is often very insightful to include the actual error message. In this case, it would have pointed much faster to a call to base R functions like as.POSIXct or strptime, and not slogging through other possible causes.)

    What is not well documented in ?as.POSIXct

          tz: a character string.  The time zone specification to be used
              for the conversion, _if one is required_.  System-specific
              (see time zones), but ‘""’ is the current time zone, and
              ‘"GMT"’ is UTC (Universal Time, Coordinated).  Invalid values
              are most commonly treated as UTC, on some platforms with a
              warning.
    

    is that tz= must be length 1. This is because a vector of POSIXt values in R must all have the same timezone. That is, one cannot have two timestamps in one vector with different timezones, the "tzone" attribute is applied to the vector as a whole. (A column of a frame is just a vector.)

    To help prove this point,

    as.POSIXct(data$date, tz = data$timezone[1])
    #  [1] "2018-02-05 MST" "2018-12-29 MST" "2018-05-25 MDT" "2018-02-19 MST" "2017-02-09 MST" "2017-10-05 MDT" "2018-02-18 MST" "2017-11-27 MST" "2017-10-13 MDT" "2018-12-03 MST"
    

    Though, depending on your data, that might alter some time values.

    Another approach would be to convert each per their specific timezones, and then combine them together. A first stab might use mapply, but this tends to strip the class:

    mapply(as.POSIXct, data$date, tz = data$timezone)
    # 2018-02-05 2018-12-29 2018-05-25 2018-02-19 2017-02-09 2017-10-05 2018-02-18 2017-11-27 2017-10-13 2018-12-03 
    # 1517814000 1546059600 1527231600 1519034400 1486616400 1507179600 1518933600 1511762400 1507878000 1543813200 
    

    We can fix that by using Map and do.call(c, ..):

    do.call(c, Map(as.POSIXct, data$date, tz = data$timezone))
    #                2018-02-05                2018-12-29                2018-05-25                2018-02-19                2017-02-09                2017-10-05                2018-02-18                2017-11-27                2017-10-13 
    # "2018-02-05 00:00:00 MST" "2018-12-28 22:00:00 MST" "2018-05-25 01:00:00 MDT" "2018-02-19 03:00:00 MST" "2017-02-08 22:00:00 MST" "2017-10-04 23:00:00 MDT" "2018-02-17 23:00:00 MST" "2017-11-26 23:00:00 MST" "2017-10-13 01:00:00 MDT" 
    #                2018-12-03 
    # "2018-12-02 22:00:00 MST" 
    

    Note that R tends to think of dates as UTC, and then converting to timezone-based timestamps does produce different times. Since, as I said earlier, all POSIXt values in a vector must all share the same timezone, all of these are converted to the time in the first timezone, though you can clearly see that the time-of-day is different for each.

    Having said that ... these all point to the same UTC-date converted to a time. If you convert each of those timestamps from MDT to the timezone returned by tz_lookup_coords, they will return back to midnight:

    
    do.call(c, Map(as.POSIXct, data$date, tz = data$timezone)) |>
      Map(f = function(tm, tz) `attr<-`(tm, "tzone", tz), data$timezone)
    # $`2018-02-05`
    # [1] "2018-02-05 MST"
    # $`2018-12-29`
    # [1] "2018-12-29 EST"
    # $`2018-05-25`
    # [1] "2018-05-25 PDT"
    # $`2018-02-19`
    # [1] "2018-02-19 HST"
    # $`2017-02-09`
    # [1] "2017-02-09 EST"
    # $`2017-10-05`
    # [1] "2017-10-05 CDT"
    # $`2018-02-18`
    # [1] "2018-02-18 CST"
    # $`2017-11-27`
    # [1] "2017-11-27 CST"
    # $`2017-10-13`
    # [1] "2017-10-13 MST"
    # $`2018-12-03`
    # [1] "2018-12-03 EST"