I have a type product in elasticsearch that contains a column containing multiple ids, some of them are the same. There are also columns containing current price and the quantity. I want the to get the sum of price * qte of each unique id.
id price qte
__ _____ _________
1 25 4
1 25 4
1 25 4
2 38 2
2 38 2
3 12 3
3 12 3
3 12 3
3 12 3
4 33 6
5 64 8
5 64 8
(if you're wondering why it's like that, it's cuz there are other columns with different values for each, also note that each unique id has a unique price and a unique quantity)
So I created my aggregation :
$id = new \Elastica\Aggregation\Terms('id');
$id->setField('id')->setSize(0);
$qte_price = new \Elastica\Aggregation\Sum('qte_price');
$qte_price->setScript('doc["price"].value * doc["qte"].value');
$id->addAggregation($qte_price);
The problem here, is that qte_price
doesn't use the first aggregation on the id
before performing the setScript()
, thus summing the total of price * qte
on all ids even the repeated ones.
In other words I want to calculate for id=1 => 25*4
(and not (25*4)*3
, for id=2 => 38*2
(and not (38*2)*2)
.. etc
I found a trick to overcome this problem is by dividing the answer by the doc_count, but I'm looking for an official way of doing this using Elastica.
I would use the avg
aggregation for both the price
and qte
fields and then use a bucket_script
pipeline aggregation to multiply the average price by the average quantity and that would do what you expect.
The average of 3*25 is 25 and the average of 3*4 is 4, then you can multiply 25 by 4 and you get your answer for id = 1... The same goes for the other ids.
In pure DSL, it'd look like this:
{
"size": 0,
"aggs": {
"by_id": {
"terms": {
"field": "id"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
},
"avg_qte": {
"avg": {
"field": "qte"
}
},
"price_by_qte": {
"bucket_script": {
"buckets_path": {
"avgPrice": "avg_price",
"avgQte": "avg_qte"
},
"script": "params.avgPrice * params.avgQte"
}
}
}
}
}
}
Expressed in Elastica, it would go like this:
$id = new \Elastica\Aggregation\Terms('id');
$id->setField('id')->setSize(0);
$avg_price = new \Elastica\Aggregation\Avg('avg_price');
$avg_price->setField('price');
$id->addAggregation($avg_price);
$avg_qte = new \Elastica\Aggregation\Avg('avg_qte');
$avg_qte->setField('qte');
$id->addAggregation($avg_qte);
$bucketScriptAggregation = new BucketScript(
'price_by_qte',
[
'avgPrice' => 'avg_price',
'avgQte' => 'avg_qte',
],
'params.avgPrice * params.avgQte'
);
$id->addAggregation($bucketScriptAggregation);