mongodbaggregation-framework

Mongodb Explain for Aggregation framework


Is there an explain function for the Aggregation framework in MongoDB? I can't see it in the documentation.

If not is there some other way to check, how a query performs within the aggregation framework?

I know with find you just do

db.collection.find().explain()

But with the aggregation framework I get an error

db.collection.aggregate(
    { $project : { "Tags._id" : 1 }},
    { $unwind : "$Tags" },
    { $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},
    { 
        $group: 
        { 
            _id : { id: "$_id"},
            "count": { $sum:1 } 
        }
    },
    { $sort: {"count":-1}}
).explain()

Solution

  • Starting with MongoDB version 3.0, simply changing the order from

    collection.aggregate(...).explain()
    

    to

    collection.explain().aggregate(...)
    

    will give you the desired results (documentation here).

    For older versions >= 2.6, you will need to use the explain option for aggregation pipeline operations

    explain:true

    db.collection.aggregate([
        { $project : { "Tags._id" : 1 }},
        { $unwind : "$Tags" },
        { $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},
        { $group: { 
            _id : "$_id",
            count: { $sum:1 } 
        }},
        {$sort: {"count":-1}}
      ],
      {
        explain:true
      }
    )
    

    An important consideration with the Aggregation Framework is that an index can only be used to fetch the initial data for a pipeline (e.g. usage of $match, $sort, $geonear at the beginning of a pipeline) as well as subsequent $lookup and $graphLookup stages. Once data has been fetched into the aggregation pipeline for processing (e.g. passing through stages like $project, $unwind, and $group) further manipulation will be in-memory (possibly using temporary files if the allowDiskUse option is set).

    Optimizing pipelines

    In general, you can optimize aggregation pipelines by:

    There are also a number of Aggregation Pipeline Optimizations that automatically happen depending on your MongoDB server version. For example, adjacent stages may be coalesced and/or reordered to improve execution without affecting the output results.

    Limitations

    As at MongoDB 3.4, the Aggregation Framework explain option provides information on how a pipeline is processed but does not support the same level of detail as the executionStats mode for a find() query. If you are focused on optimizing initial query execution you will likely find it beneficial to review the equivalent find().explain() query with executionStats or allPlansExecution verbosity.

    There are a few relevant feature requests to watch/upvote in the MongoDB issue tracker regarding more detailed execution stats to help optimize/profile aggregation pipelines: