rdatedatetimetimedata-conversion

How to convert a datetime to date in R, WITHOUT rounding the day?


I'm using R and trying to convert a datetime field into just the date? R gives me the desired format but keeps rounding up some of the day values. Specifically everything after 12 noon! I could not find any threads that address this exact problem. I actually figured out a solution but wanted to post the question because I spent a whole week troubleshooting.

#Convert the datetime field from character to a datetime
main_df$datetime <- strptime(main_df$ï..Date, format = "%m/%d/%Y %H:%M")
main_df$datetime <- as.POSIXct(main_df$datetime, tz = Sys.timezone())
head(main_df$datetime)
class(main_df$datetime)

#Remove the poorly computer-titled character field that contained datetime info
main_df <- subset(main_df, select = -c(ï..Date))

#Use the NEW datetime field to create a date field
#main_df$Date <- trunc(main_df$datetime,"days")
main_df$Date <- as.Date(main_df$datetime, format = "%m/%d/%Y")
?as.Date()
class(main_df$Date)

head(main_df$Date)

That returned:

head(main_df$datetime)
[1] "2020-05-16 00:31:00 CDT" "2020-05-16 00:30:00 CDT" "2020-05-15 23:33:00 CDT" "2020-05-15 15:33:00 CDT"
[5] "2020-05-15 22:31:00 CDT" "2020-05-15 22:12:00 CDT"

and

> class(main_df$Date)
[1] "Date"
> 
> head(main_df$Date)
[1] "2020-05-16" "2020-05-16" "2020-05-16" "2020-05-15" "2020-05-16" "2020-05-16"

Notice how the last 4 values for 'Date' should be 2020-05-15 but instead, they are converted to be 2020-05-16. So what are some other ways to fix this? I'm going to post one way that worked but I doubt it's the cleanest.


Solution

  • Do your input dates include a timezone specification? If not, they are ambiguous and the rounding may be right or it may be wrong. If they do include a timezone specification, the lubridate package should handle them correctly.

    I would advise against using tz = Sys.timezone() because that would make the interaction between input data and algorithm dependent on geography if your inputs don't include a timezone specification, so what works for you might not work for a different user in a different location.