I have GPS tag data which is for over night locations only. This has been filtered from a main data set already which calculated day and night based on sunset and sunrise times. As all day locations have been removed, night could be classed as from 3pm to 10am to allow for seasonal changes and make sure all data points are included.
The data is currently formatted with normal dates and times, e.g. 2019-03-22 21:53:00 and 2019-03-23 02:46:00. However, this data is technically from the same night. I want to add a new column to my data frame in R which groups data per night (rather than on their 'true' date) and assigns the dates/times after midnight to that of the starting date. So with the two dates above, they'd both be 2019-03-22.
So far I have tried the following code but it's making mistakes and the output column is a number (17976) rather than a date:
# Convert DateAndTime column to POSIXct format
sample$DateTime <- as.POSIXct(sample$DateTime, format = "%d/%m/%Y %H:%M")
# Extract date component
sample$Date <- as.Date(sample$DateTime)
# Assign the night date
sample$NightDate <- ifelse(hour(sample$DateTime) < 8,
sample$Date - days(1),
sample$Date)
# change NightDate from numbers (17976) back to a date
as.Date(sample$NightDate)
# error which occurs: Error in as.Date.numeric(GE.sample$NightDate) : 'origin' must be supplied
Due to data protection I can't share much of my data, but here's an example of the DateTime column which I'm working off:
# DateTime Data set
2019-03-22 00:34:00
2019-03-22 02:43:00
2019-03-22 04:56:00
2019-03-22 19:28:00
2019-03-22 21:53:00
2019-03-23 02:46:00
2019-03-23 20:32:00
2019-03-23 23:04:00
2019-03-24 01:40:00
2019-03-24 04:17:00
2019-03-24 19:34:00
2019-03-24 23:22:00
2019-03-25 02:03:00
2019-03-25 04:48:00
I'm quite new to R so any help would be appreciated, thank you for your time!
I think there are a couple of problems here.
You call as.Date(.)
but don't reassign it back to the frame. This may not be a problem yet because of your error (#2), but know that you need to do
sample$NightDate <- as.Date(...)
A number in date-objects is the number of days since an "origin". Most computing (I believe) including R uses 1970-01-01
, but Excel uses 1899-12-30
. R will not assume this for you (despite many people really wanting it to default to the "epoch" origin of 1970-01-01
), you need to specify it. Try
sample$NightDate <- as.Date(sample$NightDate, origin = "1970-01-01")
It can be a date object as.Date("1970-01-01")
or something that is unambiguously a date, as what I did above.
ifelse
drops class, it's a known thing, unfortunately. You can avoid dropping the class by removing the Date
object from inside of ifelse
's return values. Perhaps
sample$NightDate <- sample$DateTime - ifelse(hour(sample$DateTime) < 8, 86400, 0)
where 86400
is the number of seconds in one day. (It should be noted that timestamp/POSIXt
objects have units of 1 sec
and date/Date
objects have units of 1 day
.
Bottom line, starting with just DateTime
(as a POSIXt
timestamp), you can get NightDate
with
sample |>
transform(
NightDate = as.Date(trunc(
DateTime - ifelse(lubridate::hour(DateTime) < 8, 86400, 0),
by = "days"))
)
# DateTime NightDate
# 1 2019-03-22 00:34:00 2019-03-21
# 2 2019-03-22 02:43:00 2019-03-21
# 3 2019-03-22 04:56:00 2019-03-21
# 4 2019-03-22 19:28:00 2019-03-22
# 5 2019-03-22 21:53:00 2019-03-22
# 6 2019-03-23 02:46:00 2019-03-22
# 7 2019-03-23 20:32:00 2019-03-23
# 8 2019-03-23 23:04:00 2019-03-23
# 9 2019-03-24 01:40:00 2019-03-23
# 10 2019-03-24 04:17:00 2019-03-23
# 11 2019-03-24 19:34:00 2019-03-24
# 12 2019-03-24 23:22:00 2019-03-24
# 13 2019-03-25 02:03:00 2019-03-24
# 14 2019-03-25 04:48:00 2019-03-24
The lubridate::hour
function is useful. If you aren't using lubridate
for anything else and need to reduce your use of other packages, you can use as.integer(format(DateTime, format="%H"))
or as.POSIXlt(sample$DateTime)$hour
, both are identical in effect (though lubridate::hour
has a "fancy" way to render on the console).
Date
sample <- structure(list(DateTime = structure(c(1553229240, 1553236980, 1553244960, 1553297280, 1553305980, 1553323560, 1553387520, 1553396640, 1553406000, 1553415420, 1553470440, 1553484120, 1553493780, 1553503680), class = c("POSIXct", "POSIXt"), tzone = "")), row.names = c(NA, -14L), class = "data.frame")