elasticsearchkibanatimelion

Kibana Timelion: Subselect or Subquery to aggregate sum of max


Let's suppose I have the following data on ElasticSearch:

@timestamp; userId; currentPoints
August 7th 2017, 00:30:37.319; myUserName; 4
August 7th 2017, 00:43:22.121; myUserName; 10
August 7th 2017, 00:54:29.177; myUserName; 7
August 7th 2017, 01:10:29.352; myUserName; 4
August 7th 2017, 00:32:37.319; myOtherUserName; 12
August 7th 2017, 00:44:22.121; myOtherUserName; 17
August 7th 2017, 00:56:29.177; myOtherUserName; 8
August 7th 2017, 01:18:29.352; myOtherUserName; 11

I'm looking to draw a date histogram that will show me the sum of all max:currentPoints per username per hour, which whould generate the following data to plot:

August 7th 2017, 00; SumOfMaxCurrentPoints -> 27 (max from hour 00h from both users 10 + 17)
August 7th 2017, 00; SumOfMaxCurrentPoints -> 15 (max from hour 01h from both users 4 + 11)

This would usually be done with a subquery, extracting the max(currentPoints) for each hour, user and then sum the results and aggregate per hour.

Is this possible with Kibana Timelion for instance? I can't find a way to achieve this using the documentation.

Thanks Alex


Solution

  • While working on another project, I've stumpled upon the answer to do this in Kibana/Elasticsearch without using Timelion.

    The feature is called Sibling Pipeline Aggregation, and in this case you use the Sum Bucket. You can use it with any recent Kibana/Elastic visualization (I'm using version 5.5).

    For a dataset such as:

    @timestamp; userId; currentPoints
    August 7th 2017, 00:30:37.319; myUserName; 4
    August 7th 2017, 00:43:22.121; myUserName; 10
    August 7th 2017, 00:54:29.177; myUserName; 7
    August 7th 2017, 01:10:29.352; myUserName; 4
    August 7th 2017, 00:32:37.319; myOtherUserName; 12
    August 7th 2017, 00:44:22.121; myOtherUserName; 17
    August 7th 2017, 00:56:29.177; myOtherUserName; 8
    August 7th 2017, 01:18:29.352; myOtherUserName; 11
    

    Where you want an hourly SUM of(currentPoints) all MAXs(currentPoints) per userId, resulting in:

    August 7th 2017, 00; SumOfMaxCurrentPoints -> 27 (max from hour 00h from both users 10 + 17)
    August 7th 2017, 00; SumOfMaxCurrentPoints -> 15 (max from hour 01h from both users 4 + 11)
    

    You can do:

    Metric

    1. Aggregation: Sibling Pipeline Aggregation (Sum Bucket)
    2. Bucket Aggregation Type: Terms
    3. Bucket Field: userId
    4. Bucket Size: Comfortable value above the # of users if you want total precision
    5. Metric Aggregation: Max
    6. Metric Field: currentPoints

    Bucket

    1. Buckets type: Split Rows
    2. Bucket Aggregation: Date Histogram
    3. Histogram Field: @timestamp
    4. Histogram Interval: Hourly