Is there an explain function for the Aggregation framework in MongoDB? I can't see it in the documentation.
If not is there some other way to check, how a query performs within the aggregation framework?
I know with find you just do
db.collection.find().explain()
But with the aggregation framework I get an error
db.collection.aggregate(
{ $project : { "Tags._id" : 1 }},
{ $unwind : "$Tags" },
{ $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},
{
$group:
{
_id : { id: "$_id"},
"count": { $sum:1 }
}
},
{ $sort: {"count":-1}}
).explain()
Starting with MongoDB version 3.0, simply changing the order from
collection.aggregate(...).explain()
to
collection.explain().aggregate(...)
will give you the desired results (documentation here).
For older versions >= 2.6, you will need to use the explain
option for aggregation pipeline operations
explain:true
db.collection.aggregate([
{ $project : { "Tags._id" : 1 }},
{ $unwind : "$Tags" },
{ $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},
{ $group: {
_id : "$_id",
count: { $sum:1 }
}},
{$sort: {"count":-1}}
],
{
explain:true
}
)
An important consideration with the Aggregation Framework is that an index can only be used to fetch the initial data for a pipeline (e.g. usage of $match
, $sort
, $geonear
at the beginning of a pipeline) as well as subsequent $lookup
and $graphLookup
stages. Once data has been fetched into the aggregation pipeline for processing (e.g. passing through stages like $project
, $unwind
, and $group
) further manipulation will be in-memory (possibly using temporary files if the allowDiskUse
option is set).
In general, you can optimize aggregation pipelines by:
$match
stage to restrict processing to relevant documents.$match
/ $sort
stages are supported by an efficient index.$match
, $limit
, and $skip
.There are also a number of Aggregation Pipeline Optimizations that automatically happen depending on your MongoDB server version. For example, adjacent stages may be coalesced and/or reordered to improve execution without affecting the output results.
As at MongoDB 3.4, the Aggregation Framework explain
option provides information on how a pipeline is processed but does not support the same level of detail as the executionStats
mode for a find()
query. If you are focused on optimizing initial query execution you will likely find it beneficial to review the equivalent find().explain()
query with executionStats
or allPlansExecution
verbosity.
There are a few relevant feature requests to watch/upvote in the MongoDB issue tracker regarding more detailed execution stats to help optimize/profile aggregation pipelines: