group-byelasticsearchfaceted-searchfacet

ElasticSearch group by multiple fields


The only close thing that I've found was: Multiple group-by in Elasticsearch

Basically I'm trying to get the ES equivalent of the following MySql query:

select gender, age_range, count(distinct profile_id) as count 
FROM TABLE group by age_range, gender

The age and gender by themselves were easy to get:

{
  "query": {
    "match_all": {}
  },
  "facets": {
    "ages": {
      "terms": {
        "field": "age_range",
        "size": 20
      }
    },
    "gender_by_age": {
      "terms": {
        "fields": [
          "age_range",
          "gender"
        ]
      }
    }
  },
  "size": 0
}

which gives:

{
  "ages": {
    "_type": "terms",
    "missing": 0,
    "total": 193961,
    "other": 0,
    "terms": [
      {
        "term": 0,
        "count": 162643
      },
      {
        "term": 3,
        "count": 10683
      },
      {
        "term": 4,
        "count": 8931
      },
      {
        "term": 5,
        "count": 4690
      },
      {
        "term": 6,
        "count": 3647
      },
      {
        "term": 2,
        "count": 3247
      },
      {
        "term": 1,
        "count": 120
      }
    ]
  },
  "total_gender": {
    "_type": "terms",
    "missing": 0,
    "total": 193961,
    "other": 0,
    "terms": [
      {
        "term": 1,
        "count": 94799
      },
      {
        "term": 2,
        "count": 62645
      },
      {
        "term": 0,
        "count": 36517
      }
    ]
  }
}

But now I need something that looks like this:

[breakdown_gender] => Array
    (
        [1] => Array
            (
                [0] => 264
                [1] => 1
                [2] => 6
                [3] => 67
                [4] => 72
                [5] => 40
                [6] => 23
            )

        [2] => Array
            (
                [0] => 153
                [2] => 2
                [3] => 21
                [4] => 35
                [5] => 22
                [6] => 11
            )

    )

Please note that 0,1,2,3,4,5,6 are "mappings" for the age ranges so they actually mean something :) and not just numbers. e.g. Gender[1] (which is "male") breaks down into age range [0] (which is "under 18") with a count of 246.


Solution

  • As you only have 2 fields a simple way is doing two queries with single facets. For Male:

    {
        "query" : {
          "term" : { "gender" : "Male" }
        },
        "facets" : {
            "age_range" : {
                "terms" : {
                    "field" : "age_range"
                }
            }
        }
    }
    

    And for female:

    {
        "query" : {
          "term" : { "gender" : "Female" }
        },
        "facets" : {
            "age_range" : {
                "terms" : {
                    "field" : "age_range"
                }
            }
        }
    }
    

    Or you can do it in a single query with a facet filter (see this link for further information)

    {
        "query" : {
           "match_all": {}
        },
        "facets" : {
            "age_range_male" : {
                "terms" : {
                    "field" : "age_range"
                },
                "facet_filter":{
                    "term": {
                        "gender": "Male"
                    }
                }
            },
            "age_range_female" : {
                "terms" : {
                    "field" : "age_range"
                },
                "facet_filter":{
                    "term": {
                        "gender": "Female"
                    }
                }
            }
        }
    }
    

    Update:

    As facets are about to be removed. This is the solution with aggregations:

    {
      "query": {
        "match_all": {}
      },
      "aggs": {
        "male": {
          "filter": {
            "term": {
              "gender": "Male"
            }
          },
          "aggs": {
            "age_range": {
              "terms": {
                "field": "age_range"
              }
            }
          }
        },
        "female": {
          "filter": {
            "term": {
              "gender": "Female"
            }
          },
          "aggs": {
            "age_range": {
              "terms": {
                "field": "age_range"
              }
            }
          }
        }
      }
    }