mongodb mongodb-query aggregation-framework query-optimization mongosh

How to index a mongo complex query

I really don't understand how to improve the performance of my query using index.

db.Vendite.aggregate([{$group: { 
_id: {
anno: "$anno", mese: "$mese", cod_age: "$cod_age", cod_int: "$cod_int",  cod_cli: "$cod_cli", cod_linea_comm: "$cod_linea_comm", cod_sett_comm: "$cod_sett_comm", _art: "$cod_art"}, 
vendite_quantita: {
   $sum: { 
      $add: [ {
         $subtract: ["$ven_quantita", "$res_quantita"]}, "$oma_quantita"]}}, 
            vendite_quantita_pz: {
                $sum: {
                   $add: [{
                      $subtract: ["$ven_quantita_pz", "$res_quantita_pz"]}, 
                        "$oma_quantita_pz"]}}, 
vendite_netto: {
   $sum: {
      $subtract: [{
         $subtract: ["$ven_lordo", {
            $add: ["$ven_incondizionato", "$ven_finanziari", "$ven_canvass", "$ven_offerta"]}]},
               {$subtract: ["$res_lordo", {$add: ["$res_incondizionato", "$res_finanziari", "$res_offerta", "$res_offerta"]}]}]} }}}, 
{ $sort: { "anno": 1, "mese": 1, "cod_age": 1, "cod_int": 1, "cod_cli": 1, "cod_linea_comm": 1, "cod_sett_comm": 1, "cod_art": 1 }}]).explain("executionStats")

This is my query and, watching mongoDB university tutorial, I know that it's better to use an index with match > sort > range. In my case I have tried with the following index:

anno_1_mese_1_cod_age_1_cod_int_1_cod_cli_1_cod_linea_comm_1_cod_sett_comm_1_cod_art_1

That is my sort order.

There is not any improvement about query time. My first thought is that the query is too detailed ... I mean the collection has 2.7 M records and return back 2.1 M results, so indexing could help but not that much. Is it correct or I am doing something wrong?

Should I take care of my sum fields too in the indexing process?

Just to say, when I use the query without forcing any index by "hint", mongo executes it without any index.

Edit: [MongoDB version 7.0.6] [Mongosh version 2.1.5]

Solution

If your query returns 2.1M of 2.7M records, I do not believe it will be ever fast. You have so many conditions and still so many results. Looks like an usual statistical query to me, which are usually slow. Sure, with proper index it should be faster than without. I can tell no more than the usual indexing recommendations: start your index with the field, which limits more results. For example the field "anno" (year, right?) I would not put to the first place because it will not limit you probably a lot of entries. Sometimes also the smaller index is much faster than the bigger one. That might be a reason for your query planner not to use your huge index. Try to create really small index only with the fields that really limit your output. If every row contains the same year, logically it does not make too much sense to include it to the index. I personally would create some workaround, but without knowing more details it is hard to tell.