I have setup snowplow with Elasticsearch.
When I want to get the data out I just do normal queries and use aggregates to get them by day, country etc.
So I want to figure out clickthru rate for these aggregations, I have 2 kind of events: page views and clicks.
Currently I do 2 queries:
Page Views:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"event": "page_view"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
},
"aggs": {
"dates": {
"date_histogram": {
"field": "collector_tstamp",
"interval": "day"
}
}
}
}
Clicks:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"event": "struct"
}
},
{
"term": {
"se_action": "click"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
},
"aggs": {
"dates": {
"date_histogram": {
"field": "collector_tstamp",
"interval": "day"
}
}
}
}
I format the response to something easier to use and then merge them in PHP using something like this.
function merge_metrics($pv,$c){
$r = array();
if(count($pv) > 0){
foreach ($pv as $key => $value) {
$r[$value['name']]['page_views'] += $value['count'];
}
}
if(count($c) > 0){
foreach ($c as $key => $value) {
$r[$value['name']]['clicks'] += $value['count'];
}
}
$rf = array();
foreach ($r as $key => $value) {
$tmp_clicks = isset($value['clicks']) ? $value['clicks'] : 0;
$tmp_page_views = isset($value['page_views']) ? isset($value['page_views']) : 0;
$rf[] = array(
'name' => $key,
'page_views' => $tmp_page_views,
'clicks' => $tmp_clicks,
'ctr' => ctr($tmp_clicks,$tmp_page_views)
);
}
return $rf;
}
Both $pv and $c are arrays that contain the aggregates that result from querying Elasticsearch and I do some formatting for ease of use.
My question is:
Is it possible get multiple metrics(in my case page views and clicks, these are specific filters) and perform same aggregations on both ? then returning the aggregations something like :
{
"data": [
{
"day": "2015-10-13",
"page_views": 61,
"clicks": 0,
},
{
"day": "2015-10-14",
"page_views": 135,
"clicks": 1,
},
{
"day": "2015-10-15",
"page_views": 39,
"clicks": 0,
}
]
}
But without me having to manually merge them ?
Yes, it is definitely possible if you merge your aggregations into one single query. For instance, I suppose you have one query like this for page views:
{
"query": {...}
"aggregations": {
"by_day": {
"date_histogram": {
"field": "day",
"interval": "day"
},
"aggs": {
"page_views_per_day": {
"sum": {
"field": "page_views"
}
}
}
}
}
}
And another query like this for clicks:
{
"query": {...}
"aggregations": {
"by_day": {
"date_histogram": {
"field": "day",
"interval": "day"
},
"aggs": {
"clicks_per_day": {
"sum": {
"field": "clicks"
}
}
}
}
}
}
Provided you have the same constraints in your query
, you can definitely merge them together at the date_histogram
level, like this:
{
"query": {...}
"aggregations": {
"by_day": {
"date_histogram": {
"field": "day",
"interval": "day"
},
"aggs": {
"page_views_per_day": {
"sum": {
"field": "page_views"
}
},
"clicks_per_day": {
"sum": {
"field": "clicks"
}
}
}
}
}
}
UPDATE
Since your queries are different for each of your aggregations, we need to do it slightly differently, i.e. by using an additional filters
aggregation, like this:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"terms": {
"event": [
"page_view",
"struct"
]
}
}
],
"should": {
"term": {
"se_action": "click"
}
},
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
},
"aggs": {
"dates": {
"date_histogram": {
"field": "collector_tstamp",
"interval": "day"
},
"aggs": {
"my_filters": {
"filters": {
"filters": {
"page_views_filter": {
"bool": {
"must": [
{
"term": {
"event": "page_view"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
},
"clicks_filter": {
"bool": {
"must": [
{
"term": {
"event": "struct"
}
},
{
"term": {
"se_action": "click"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
}
}
}
}
}
}
Now for each daily bucket, you're going to end up with two sub-buckets, one for the count of page views and another for the count of clicks.