I have 1 min OHLC bars on some market data that I imported into Mongo DB.
Each document looks like this:
{
"_id" : ObjectId("5ac3163f31a0632c7642ca1c"),
"Date" : "08/06/2007",
"Time" : "15:01",
"Open" : 1310,
"High" : 1310.25,
"Low" : 1309.5,
"Close" : 1310,
"Up" : 209,
"Down" : 165,
"Volume" : 0
}
I want to build a function that allows me to quickly generate X-bar intervals from this data. i.e. generate output 5-min bars, 1 hr bars, daily bars, etc... I also want to be able to filter out a data range.
I've been playing around with Mongo's aggregation functions, but I'm getting overwhelmed how I should approach this AND how I should order the pipeline operations.
Do I first group by 'Date', then sort by 'Time', then group again by $first, $last, $max and $min?
Or do I first create a new field somehow combining 'Date' and 'Time' and then proceed to the grouping?
Although don't I need to first somehow convert the "Date" and "Time" fields from string to Date field so that Mongo knows how to sort and match properly? ...but then which order would I do that in?
I'm still a newbie to MongoDB, so any advice would be appreciated.
Ok, I've come up with a solution:
db.minbars.aggregate([
{
$project:
{
dts:
{
$dateFromString:
{
dateString:
{
$concat: ['$Date', '$Time']
}
}
},
Open:1,
High:1,
Low:1,
Close:1
}
},
{
$match:
{
dts:
{
$gte: ISODate("2016-01-01T00:00:00.000Z"),
$lte: ISODate("2016-12-31T00:00:00.000Z")
}
}
},
{
$sort: { dts : 1 }
},
{
$group:
{
_id:
{
year: {$year: "$dts"},
month: {$month: "$dts"},
day: {$dayOfMonth: "$dts"},
hour: {$hour: "$dts"},
min:
{
$add:
[
{$subtract:
[
{$minute: "$dts"},
{$mod: [{$minute: "$dts"}, 5]}
]},
5
]
}
},
Open: {$first: "$Open"},
High: {$max: "$High"},
Low: {$min: "$Low"},
Close: {$last: "$Close"}
}
}
], {allowDiskUse: true})
Here's an explanation for each pipeline stage:
Use 'dateFromString' from combine 'Date' and 'Time' into an ISODate object ('dts' - which stands for date timestamp). Preserve the other OHLC fields.
Filter out based on a date range
Sort by the new ISODate object ('dts').
Group together all those documents with the same Year, Month, Day, Hour, and 5-minute minute interval. The minute interval uses the formula: minute = minuteIn - (minuteIn % i) + i, where i=minute interval. I'm adding 'i' so that minutes 00, 01, 02, 03, and 04 are aggregated to the next 05 minute interval (and not the preceding 00 minute interval). NOTE: if you want 1-hr, 4-hr, Daily bars, etc...then you need to adjust the _id section accordingly.
NOTE: I'm using {allowDiskUse: true} here because at one point I ran into the Memory constraints at the Sort stage.
Maybe someone can come up with a simpler way to do this?
UPDATE:
As I noted in 4) above, I mentioned I was adding "i" (minute interval) to the resulting minute. However, when I did this I ended up with a '60' minute interval showing up in the output. You should only have 0, 5, 10, 15, ...55 minute bars and there should NOT be a 60-minute bar. So this was NOT correct.
Also, if you compare against a trading platform (i.e. Thinkorswim) you can see that the standard practice is to use preceding 5-minute interval for the timestamp of the bar. For example, the 5-minute bar 9:25 represents the aggregation of these minute bars: 9:25, 9:26, 9:27, 9:28, 9:29.