elasticsearchcardinality

How to query distinct count distibution in elasticsearch


Cardinality aggregation query calculates an approximate count of distinct values. How we can calculate the cardinality distribution of documents?

For example suppose we have:

a,a,a,b,b,b,c,c,d,d,e

and distinct count distribution is:

3: 2 # count of distint element that have 3 occurnes (a, b) 
2: 2 # c, d
1: 1 # e

Solution

  • Actually you cannot do aggregations like this.

    But, using transform API (https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-examples.html) you could create a new index to do a simple terms aggregation:

    PUT _transform/so
    {
      "dest" : {
       "index" : "my-so"
      },
      "source": {
        "index": "my-index"
      },
      "pivot": {
        "group_by": { 
          "country": {
            "terms": {
              "field": "letter"
            }
          }
        },
        "aggregations": {
          "cardinality": {
            "value_count": { 
              "field" : "letter"
            }
          }
        }
      }
    }
    

    This will give you:

    [
        {
          "country" : "a",
          "cardinality" : 22
        },
        {
          "country" : "b",
          "cardinality" : 4
        },
        {
          "country" : "c",
          "cardinality" : 5049
        }...
    

    Then, you can use simple terms or histogram aggregation:

    GET /my-so/_search
    {
      "size" : 0,
      "aggs": {
        "cc": {
          "terms": {
            "field": "cardinality"
          }
        }
      }
    }