I need to do previous tick aggregation on my tick data set for 5 minute intervals. Please note what I want to do is analogous to aggregateTrades() function in highfrequency package. But I need to solve this problem without using highfrequency package due to some other data handling issues. This is my data set:
dput(tt)
structure(c(1371.25, NA, 1373.95, NA, NA, 1373, NA, 1373.95,
1373.9, NA, NA, 1374, 1374.15, NA, 1374, 1373.85, 1372.55, 1374.05,
1374.15, 1374.75, NA, NA, 1375.9, 1374.05, NA, NA, NA, NA, NA,
NA, NA, 1375, NA, NA, NA, NA, NA, 1376.35, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1376.25, NA, 1378, 1376.5, NA, NA, NA, 1378,
1378, NA, NA, 1378.8, 231.9, 231.85, NA, 231.9, 231.85, 231.9,
231.8, 231.9, 232.6, 231.95, 232.35, 232, 232.1, 232.05, 232.05,
232.05, 231.5, 231.3, NA, NA, 231.1, 231.1, 231.1, 231, 231,
230.95, 230.6, 230.6, 230.7, 230.6, 231, NA, 231, 231, 231.45,
231.65, 231.4, 231.7, 231.3, 231.25, 231.25, 231.4, 231.4, 231.85,
231.75, 231.5, 231.55, 231.35, NA, 231.5, 231.5, NA, 231.5, 231.25,
231.15, 231, 231, 231, 231.05, NA), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "Asia/Calcutta", tzone = "Asia/Calcutta", index = structure(c(1459481850,
1459482301, 1459482302, 1459482303, 1459482304, 1459482305, 1459482306,
1459482307, 1459482309, 1459482310, 1459482311, 1459482312, 1459482314,
1459482315, 1459482316, 1459482317, 1459482318, 1459482319, 1459482320,
1459482321, 1459482322, 1459482323, 1459482324, 1459482326, 1459482328,
1459482329, 1459482330, 1459482331, 1459482332, 1459482336, 1459482337,
1459482338, 1459482339, 1459482342, 1459482344, 1459482346, 1459482347,
1459482348, 1459482349, 1459482350, 1459482351, 1459482354, 1459482355,
1459482356, 1459482357, 1459482358, 1459482359, 1459482362, 1459482363,
1459482364, 1459482369, 1459482370, 1459482371, 1459482372, 1459482373,
1459482378, 1459482379, 1459482380, 1459482382, 1459482388), tzone = "Asia/Calcutta", tclass = c("POSIXct",
"POSIXt")), .Dim = c(60L, 2L), .Dimnames = list(NULL, c("A",
"B")), class = c("xts", "zoo"))
This is my code for previous tick aggregation:
ag.5min.tt<-tt%>%filter(as.Date(index(tt)))%>%lapply(aggregate(by=cut(format(index(tt), format = "%H:%M:%S"), breaks = "5 mins", Fun=tail)))
What I am trying to do with the above code is to make 5 minute intervals each day for the prices of A and B. But I am getting error. Please suggest how to fix this error:
Error in UseMethod("filter_") :
no applicable method for 'filter_' applied to an object of class "c('xts', 'zoo')"
Thanks.
Edit: Converting the xts object to dataframe:
tt<-as.data.frame(tt)
tt<-data.frame(Time=rownames(tt), coredata(tt))
ag.5min.tt<-tt%>% filter(as.Date(index(tt)))%>%lapply(aggregate(by=cut(format(index(tt), format = "%H:%M:%S"), breaks = "5 mins", Fun=tail)))
New error:
Error in eval(substitute(expr), envir, enclos) :
filter condition does not evaluate to a logical vector.
Edit: attempt:
tt$Time<- as.POSIXct(tt$Time, format="%Y-%m-%d %H:%M:%S")
ag.5min.tt<-tt%>% group_by(Time==as.Date(tt$Time))%>%lapply(aggregate(by=cut(format(tt$Time, format = "%H:%M:%S"), breaks = "5 mins", Fun=tail)))
Error:
Error in cut.default(format(tt$Time, format = "%H:%M:%S"), breaks = "5 mins", :
'x' must be numeric
In addition: Warning message:
In eval(substitute(expr), envir, enclos) :
Incompatible methods ("Ops.POSIXt", "Ops.Date") for "=="
The result would look like this. Every five minute time stamp will have values for that particular time stamp or if there is an NA that time stamp will have last non- NA value for the stocks A and B
time A B
1 2016-04-01 09:00:00 NA NA
2 2016-04-01 09:05:00 NA NA
3 2016-04-01 09:10:00 NA NA
4 2016-04-01 09:15:00 1371.25 231.90
5 2016-04-01 09:20:00 1376.35 231.55
You can use .indexmin
to index your time-series by minutes and then manipulate that index to subset observations:
ind <- which(diff(.indexmin(tt) %% 5) == -4)
res <- tt[ind]
Here, .indexmin(tt) %% 5
will return the number of minutes since the last fifth minute. For our purposes, we want to extract from this the last index from each series of 4
s, which is the last observation for the minute preceding each fifth minute. To do that we can use diff
and just extract the the index for which there is a cross over from 4
to 0
(resulting in a diff
of -4
) using which
.
To illustrate, we modify your posted data to add observations that actually satisfy your extraction condition:
tt <- structure(c(1371.25, NA, 1373.95, NA, NA, 1373, NA, 1373.95,
1373.9, NA, NA, 1374, 1374.15, NA, 1374, 1373.85, 1372.55, 1374.05,
1374.15, 1374.75, NA, NA, 1375.9, 1374.05, NA, NA, NA, NA, NA,
NA, NA, 1375, NA, NA, NA, NA, NA, 1376.35, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1376.25, NA, 1378, 1376.5, NA, NA, NA, 1378,
1378, NA, NA, 1378.8, 231.9, 231.85, NA, 231.9, 231.85, 231.9,
231.8, 231.9, 232.6, 231.95, 232.35, 232, 232.1, 232.05, 232.05,
232.05, 231.5, 231.3, NA, NA, 231.1, 231.1, 231.1, 231, 231,
230.95, 230.6, 230.6, 230.7, 230.6, 231, NA, 231, 231, 231.45,
231.65, 231.4, 231.7, 231.3, 231.25, 231.25, 231.4, 231.4, 231.85,
231.75, 231.5, 231.55, 231.35, NA, 231.5, 231.5, NA, 231.5, 231.25,
231.15, 231, 231, 231, 231.05, NA), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "Asia/Calcutta", tzone = "Asia/Calcutta", index = structure(c(1459482299,
1459482301, 1459482302, 1459482303, 1459482304, 1459482305, 1459482306,
1459482307, 1459482309, 1459482310, 1459482311, 1459482312, 1459482314,
1459482315, 1459482316, 1459482317, 1459482318, 1459482319, 1459482320,
1459482321, 1459482322, 1459482323, 1459482324, 1459482326, 1459482328,
1459482329, 1459482330, 1459482331, 1459482332, 1459482336, 1459482337,
1459482338, 1459482339, 1459482342, 1459482344, 1459482346, 1459482347,
1459482348, 1459482349, 1459482590, 1459482591, 1459482594, 1459482595,
1459482596, 1459482597, 1459482598, 1459482599, 1459482602, 1459482603,
1459482604, 1459482609, 1459482610, 1459482611, 1459482612, 1459482613,
1459482618, 1459482619, 1459482620, 1459482622, 1459482628), tzone = "Asia/Calcutta", tclass = c("POSIXct",
"POSIXt")), .Dim = c(60L, 2L), .Dimnames = list(NULL,c("A",
"B")), class = c("xts", "zoo"))
## A B
##2016-04-01 09:14:59 1371.25 231.90
##2016-04-01 09:15:01 NA 231.85
##2016-04-01 09:15:02 1373.95 NA
##2016-04-01 09:15:03 NA 231.90
##2016-04-01 09:15:04 NA 231.85
##2016-04-01 09:15:05 1373.00 231.90
##2016-04-01 09:15:06 NA 231.80
##2016-04-01 09:15:07 1373.95 231.90
##2016-04-01 09:15:09 1373.90 232.60
##2016-04-01 09:15:10 NA 231.95
##2016-04-01 09:15:11 NA 232.35
##2016-04-01 09:15:12 1374.00 232.00
##2016-04-01 09:15:14 1374.15 232.10
##2016-04-01 09:15:15 NA 232.05
##2016-04-01 09:15:16 1374.00 232.05
##2016-04-01 09:15:17 1373.85 232.05
##2016-04-01 09:15:18 1372.55 231.50
##2016-04-01 09:15:19 1374.05 231.30
##2016-04-01 09:15:20 1374.15 NA
##2016-04-01 09:15:21 1374.75 NA
##2016-04-01 09:15:22 NA 231.10
##2016-04-01 09:15:23 NA 231.10
##2016-04-01 09:15:24 1375.90 231.10
##2016-04-01 09:15:26 1374.05 231.00
##2016-04-01 09:15:28 NA 231.00
##2016-04-01 09:15:29 NA 230.95
##2016-04-01 09:15:30 NA 230.60
##2016-04-01 09:15:31 NA 230.60
##2016-04-01 09:15:32 NA 230.70
##2016-04-01 09:15:36 NA 230.60
##2016-04-01 09:15:37 NA 231.00
##2016-04-01 09:15:38 1375.00 NA
##2016-04-01 09:15:39 NA 231.00
##2016-04-01 09:15:42 NA 231.00
##2016-04-01 09:15:44 NA 231.45
##2016-04-01 09:15:46 NA 231.65
##2016-04-01 09:15:47 NA 231.40
##2016-04-01 09:15:48 1376.35 231.70
##2016-04-01 09:15:49 NA 231.30
##2016-04-01 09:19:50 NA 231.25
##2016-04-01 09:19:51 NA 231.25
##2016-04-01 09:19:54 NA 231.40
##2016-04-01 09:19:55 NA 231.40
##2016-04-01 09:19:56 NA 231.85
##2016-04-01 09:19:57 NA 231.75
##2016-04-01 09:19:58 NA 231.50
##2016-04-01 09:19:59 NA 231.55
##2016-04-01 09:20:02 NA 231.35
##2016-04-01 09:20:03 1376.25 NA
##2016-04-01 09:20:04 NA 231.50
##2016-04-01 09:20:09 1378.00 231.50
##2016-04-01 09:20:10 1376.50 NA
##2016-04-01 09:20:11 NA 231.50
##2016-04-01 09:20:12 NA 231.25
##2016-04-01 09:20:13 NA 231.15
##2016-04-01 09:20:18 1378.00 231.00
##2016-04-01 09:20:19 1378.00 231.00
##2016-04-01 09:20:20 NA 231.00
##2016-04-01 09:20:22 NA 231.05
##2016-04-01 09:20:28 1378.80 NA
With this data, we get:
print(res)
## A B
##2016-04-01 09:14:59 1371.25 231.90
##2016-04-01 09:19:59 NA 231.55
To get the output you posted, you would need first generate a time series that have data (set to NA
) for every 5 minute ticks that you want. For this example, this time series (only for 5 minute ticks from 09:00
to 09:20
on 2016-04-01
) can be:
every.5.min <- structure(c(NA, NA, NA, NA, NA), .Dim = c(5L, 1L), .Dimnames = list(
NULL, "Empty"), index = structure(c(1459481400, 1459481700,
1459482000, 1459482300, 1459482600), tzone = "Asia/Calcutta", tclass = c("POSIXct",
"POSIXt")), class = c("xts", "zoo"), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "Asia/Calcutta", tzone = "Asia/Calcutta")
## Empty
##2016-04-01 09:00:00 NA
##2016-04-01 09:05:00 NA
##2016-04-01 09:10:00 NA
##2016-04-01 09:15:00 NA
##2016-04-01 09:20:00 NA
Then, merge
this with tt
:
tt <- merge(tt, every.5.min, all=TRUE)[,1:ncol(tt)]
The all=TRUE
will fill rows in the original tt
with NA
if that row (i.e., every 5 min) does not exist in tt
. Note after the merge, we only keep the columns from the original tt
.
Then, over tt
, fill all NA
with prior values as you did:
res <- do.call(merge, lapply(tt, na.locf))
Finally, extract only those rows for each 5 min tick using .indexmin
and .indexsec
:
res <- res[.indexmin(res) %% 5 == 0 & .indexsec(res) == 0]
## A B
##2016-04-01 09:00:00 NA NA
##2016-04-01 09:05:00 NA NA
##2016-04-01 09:10:00 NA NA
##2016-04-01 09:15:00 1371.25 231.90
##2016-04-01 09:20:00 1376.35 231.55