I am encountering an error in R when trying to loop over time. Here is a subset of my dataframe (containing 120000 rows).
time value mean group
1 2017-01-01 12:00:00 0.507 0.5106533 NA
2 2017-01-01 12:05:00 0.526 0.5106533 NA
3 2017-01-01 12:10:00 0.489 0.5106533 NA
4 2017-01-01 12:15:00 0.598 0.5106533 NA
5 2017-01-01 12:20:00 0.564 0.5106533 NA
6 2017-01-01 12:25:00 0.536 0.5106533 NA
Lets say I want to create groups based on time period, with an expected result like this one :
time value mean group
1 2017-01-01 12:00:00 0.507 0.5106533 A
2 2017-01-01 12:05:00 0.526 0.5106533 A
3 2017-01-01 12:10:00 0.489 0.5106533 B
4 2017-01-01 12:15:00 0.598 0.5106533 B
5 2017-01-01 12:20:00 0.564 0.5106533 C
6 2017-01-01 12:25:00 0.536 0.5106533 C
I tried the following code :
for (i in 1:length(merged.data$group)){
if (merged.data[as.POSIXlt(i)$time >= "2017-05-15 12:00:00 GMT" &
as.POSIXlt(i)$time <= "2017-05-29 12:00:00 GMT",]){
merged.data$group == "A"}
else if (merged.data[as.POSIXlt(i)$time >= "2017-08-11 12:00:00" &
as.POSIXlt(i)$time <= "2017-11-29 16:00:00",]){
merged.data$group == "B"}
else if (merged.data[as.POSIXlt(i)$time >= "2018-01-05 12:00:00" &
as.POSIXlt(i)$time <= "2018-02-16 16:00:00",]){
merged.data$group == "C"}
}
I get the following error :
Error in as.POSIXlt.numeric(i) : 'origin' must be supplied
I don't get it, I thought that POSIXlt was getting rid of origin problems ? Although, I admit that my understanding of time problems in R is a bit confuse and I have some hard time coding each times I need to deal with time/dates...
So I hope someone can help me, don't hesitate to tell me if I'm unclear or if more/better information is needed to answer my question.
Thank you by advance stackoverflowers !
data.table approach...
sample data
library( data.table )
dt <- fread("time value mean
2017-01-01T12:00:00 0.507 0.5106533
2017-01-01T12:05:00 0.526 0.5106533
2017-01-01T12:10:00 0.489 0.5106533
2017-01-01T12:15:00 0.598 0.5106533
2017-01-01T12:20:00 0.564 0.5106533
2017-01-01T12:25:00 0.536 0.5106533 ", header = TRUE)
dt[, time := as.POSIXct( time, format = "%Y-%m-%dT%H:%M:%S" )]
code
library( data.table )
library( lubridate )
dt[, group := LETTERS[.GRP], by = lubridate::floor_date( time, "10 mins" ) ]
# time value mean group
# 1: 2017-01-01 12:00:00 0.507 0.5106533 A
# 2: 2017-01-01 12:05:00 0.526 0.5106533 A
# 3: 2017-01-01 12:10:00 0.489 0.5106533 B
# 4: 2017-01-01 12:15:00 0.598 0.5106533 B
# 5: 2017-01-01 12:20:00 0.564 0.5106533 C
# 6: 2017-01-01 12:25:00 0.536 0.5106533 C
approach using foverlaps
, based on the provided sample data and code
library( data.table )
#create lookup-table with periods and group-names
periods.dt <- data.table(
start = as.POSIXct( c( "2017-05-15 12:00:00", "2017-08-11 12:00:00", "2018-01-05 12:00:00" ), tz = "GMT" ),
stop = as.POSIXct( c( "2017-08-11 12:00:00", "2018-01-05 12:00:00", "2018-02-16 16:00:00"), tz = "GMT" ),
group = LETTERS[1:3] )
#set keys
setkey( periods.dt, start, stop )
#create sample data
dt <- fread("time value mean
2017-01-01T12:00:00 0.507 0.5106533
2017-01-01T12:05:00 0.526 0.5106533
2017-01-01T12:10:00 0.489 0.5106533
2017-01-01T12:15:00 0.598 0.5106533
2017-01-01T12:20:00 0.564 0.5106533
2017-01-01T12:25:00 0.536 0.5106533 ", header = TRUE)
dt[, time := as.POSIXct( time, format = "%Y-%m-%dT%H:%M:%S", tz = "GMT" )]
#create dummies to join on
dt[, `:=`( start = time, stop = time )]
#perform overlap join, no match --> NA
foverlaps( dt, periods.dt, type = "within", nomatch = NA)[, c("time", "value","mean","group"), with = FALSE]
# time value mean group
# 1: 2017-01-01 12:00:00 0.507 0.5106533 <NA>
# 2: 2017-01-01 12:05:00 0.526 0.5106533 <NA>
# 3: 2017-01-01 12:10:00 0.489 0.5106533 <NA>
# 4: 2017-01-01 12:15:00 0.598 0.5106533 <NA>
# 5: 2017-01-01 12:20:00 0.564 0.5106533 <NA>
# 6: 2017-01-01 12:25:00 0.536 0.5106533 <NA>