elasticsearchpyelasticsearch

Aggregations group by parameters in Elasticsearch


I have documents in elasticsearch that are something like:

    {
      "numberOfBedrooms": 2,
      "price": 1500,
      "type": flat
    }

I would like to get statistics like what is the average price by room, what is the average price by type and also combinations like what is the average price per numberOfBedroom+type combinations. How can I use aggregations in elastic search to achieve that?

Thanks!


Solution

  • To show the average price by number of rooms, create a query with two levels of nested aggregation.

    For example this query:

    curl -XGET 'http://localhost:9200/myindex/houses/_search?pretty&search_type=count' -d '{
       "query" : {
            "match_all" : { }
       },
       "aggs": {
         "bed_agg": {
           "terms": {"field": "numberOfBedrooms"},        
             "aggs" : {
                 "avg_price" : { "avg" : { "field" : "price" } }
             }
           }
         }
       }
     }'
    

    Should return something like:

      "aggregations" : {
        "bed_agg" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [ {
        "key" : 2,
        "doc_count" : 2,
        "avg_price" : {
          "value" : 1750.0
        }
          }, {
        "key" : 3,
        "doc_count" : 1,
        "avg_price" : {
          "value" : 2100.0
        }
          } ]
        }
    

    To do aggregation over another level (e.g. adding in the building type) you can either create a new level of aggregation - e.g. nest the "type" aggregation inside the "bedrooms" aggregation.

    curl -XGET 'http://localhost:9200/myindex/houses/_search?pretty&search_type=count' -d '{
      "query" : {
           "match_all" : { }
      },
      "aggs": {
        "bed_agg": {
          "terms": {"field": "numberOfBedrooms"},
          "aggs": {
          "type_agg": {
            "terms": {"field": "type"},          
            "aggs" : {
              "avg_price" : { "avg" : { "field" : "price" } }
             }
            }
           }
          }
        }
      }
    }'
    

    Alternatively you can create a single bucket with both fields in it using a script:

     "aggs": {
            "bed_type_agg": {
              "terms": { "script" : "doc[\"numberOfBedrooms\"].value+doc[\"type\"].value"},       
                  "aggs" : {
                   "avg_price" : { "avg" : { "field" : "price" } }
                }
              }
            }