elasticsearchelasticsearch-aggregation

Does Elastic Search have an equivalent to SQL Group By aggregation?


I know Elastic Search could do aggregation, but you know DB aggregation is not totally same as ES world.

Like we do below in DB:

select a, b, c, d, sum(e), sum(f)
from some_table nolock
group by a, b, c, d

It will return:

a, b, c, d, sum e, sum f
1, 2, 3, 4, 100  , 100
1, 2, 3, 5, 150  , 150
...

So each a+b+c+d as different key, will have one row return.

But how can I do it in ES aggregation? It will return nested result and if I aggregation with different sequence, like a->b->c->d result will be different with d->c->b->a result. But in DB, group by a,b,c,d will not different with group by d,c,b,a.

Any suggestion is welcome, Thanks

Have tried xpack and parallelly aggregation, xpack hard to use, parallelly not as expected


Solution

  • First of all there is really no difference between a->b->c->d and d->c->b->a result as long as you process the keys correctly, the only difference will be the order. Here is an example:

    DELETE test
    PUT test
    {
      "mappings": {
        "properties": {
          "a": {
            "type": "integer"
          },
          "b": {
            "type": "integer"
          },
          "c": {
            "type": "integer"
          },
          "d": {
            "type": "integer"
          },
          "e": {
            "type": "integer"
          },
          "f": {
            "type": "integer"
          }
        }
      }
    }
    
    
    POST test/_bulk?refresh
    {"index": {"_id": 1}}
    {"a": 1, "b": 2, "c": 3, "d": 4, "e": 50, "f": 50 }
    {"index": {"_id": 2}}
    {"a": 1, "b": 2, "c": 3, "d": 4, "e": 50, "f": 50 }
    {"index": {"_id": 3}}
    {"a": 1, "b": 2, "c": 3, "d": 5, "e": 50, "f": 50 }
    {"index": {"_id": 4}}
    {"a": 1, "b": 2, "c": 3, "d": 5, "e": 50, "f": 50 }
    {"index": {"_id": 5}}
    {"a": 1, "b": 2, "c": 3, "d": 5, "e": 50, "f": 50 }
    
    POST test/_search
    {
      "size": 0,
      "aggs": {
        "a": {
          "terms": {
            "field": "a"
          },
          "aggs": {
            "b": {
              "terms": {
                "field": "b"
              },
              "aggs": {
                "c": {
                  "terms": {
                    "field": "c"
                  },
                  "aggs": {
                    "d": {
                      "terms": {
                        "field": "d"
                      },
                      "aggs": {
                        "sum_e": {
                          "sum": {
                            "field": "e"
                          }
                        },
                        "sum_f": {
                          "sum": {
                            "field": "f"
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    
    POST test/_search
    {
      "size": 0,
      "aggs": {
        "d": {
          "terms": {
            "field": "d"
          },
          "aggs": {
            "b": {
              "terms": {
                "field": "b"
              },
              "aggs": {
                "c": {
                  "terms": {
                    "field": "c"
                  },
                  "aggs": {
                    "a": {
                      "terms": {
                        "field": "a"
                      },
                      "aggs": {
                        "sum_e": {
                          "sum": {
                            "field": "e"
                          }
                        },
                        "sum_f": {
                          "sum": {
                            "field": "f"
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    
    

    However, you might be more comfortable with composite aggregations that will produce flatter results closer to what you expect from SQL.

    POST test/_search
    {
      "size": 0,
      "aggs": {
        "my_groups": {
          "composite": {
            "sources": [
              {
                "a": {
                  "terms": {
                    "field": "a"
                  }
                }
              },
              {
                "b": {
                  "terms": {
                    "field": "b"
                  }
                }
              },
              {
                "c": {
                  "terms": {
                    "field": "c"
                  }
                }
              },
              {
                "d": {
                  "terms": {
                    "field": "d"
                  }
                }
              }
            ]
          },
          "aggs": {
            "sum_e": {
              "sum": {
                "field": "e"
              }
            },
            "sum_f": {
              "sum": {
                "field": "f"
              }
            }
          }
        }
      }
    }
    

    But you can also allow Elasticsearch to write that aggregation for you and get exactly what you expected:

    POST /_sql?format=csv
    {
      "query": "SELECT a, b, c, d, sum(e), sum(f) FROM test GROUP BY a, b, c, d",
      "fetch_size": 10
    }