ibm-watsonwatson-discovery

Get documents from Watson Discovery Service when doing a histogram


I am currently doing a using the histogram feature in Watson discovery, but I need to documents for each slice, so I can then do work on them again. ( such as looking at average sentiment )

This is my query, which breaks down my data into 15 mins chunks

filter(enriched_tweet.concepts.text:"'Hockey'").histogram(extracted_metadata.utc_timestamp,interval:900000)

but the response only tells me how many documents are in each "slice".

{
"matching_results": 444530,
"aggregations": [
    {
        "type": "filter",
        "match": "enriched_tweet.concepts.text:\"'Hockey'\"",
        "matching_results": 69556,
        "aggregations": [
            {
                "type": "histogram",
                "field": "utc_timestamp",
                "interval": 900000,
                "results": [
                    {
                        "key": 1498227300000,
                        "matching_results": 180
                    },
                    {
                        "key": 1498228200000,
                        "matching_results": 258
                    },

Extension to the answer below

So, you can perform actions on the data in the buckets even though you don't seem them in your results, for example, the following will work:

filter(enriched_tweet.concepts.text:"'Hockey'").histogram(utc_timestamp,interval:900000).sum(followers)

What I want is an array of docs for each slice, so I can then go over them and so sums on them to work out the sentiment of a 15 min interval for example.


Solution

  • You can run a sum aggregation nested under your histogram aggregation which can sum a field within the histogram buckets. See https://www.ibm.com/watson/developercloud/doc/discovery/query-reference.html#aggregations for more about aggregations.