mongodbaggregation-framework

Why is one my index not chosen by MongoDB for a specific query?


I have the following post document :

post = {"_id": "postID1", "userID": "userID1", "likeNum": 10, "url": "anyUrl"}

And i have a query :

posts_collection.find({"userID": {"$nin": ["uid1", "uid2"]}, "_id": {"$nin": ["postID1", "postID2"]}}).sort("likeNum", DESCENDING).limit(30).explain()

And my aggregation pipeline :

  pipeline = [
            {"$match":{"userID": {"$nin": ["uid1", "uid2"]}, "_id": {"$nin": ["postID1", "postID2"]}}},
            {"$sort": {"likeNum": -1}},
            {"$group": {"_id": "$userID", "posts": {"$firstN": {"input": "$$ROOT", "n": 2}}}},
            {"$unwind": "$posts"},
            {"$replaceWith": "$posts"},
            {"$sort": {"likeNum": -1}},
            {"$limit": 30}
        ]
posts_collection.aggregate(pipeline)

I have built an index likeNum: -1 and another compound index userID: 1, _id: 1, likeNum: -1

The explain() method always shows that it goes for likeNum: -1 and performs a collection scan.

And my aggregation pipeline explained output is also showing the same info in my Query Profiler in my MongoDB Atlas dashboard.

Is it because the sort in memory of $nin is less performant than the collection scan ? Or any other reason ?

Note : Sometimes the query may have only the userID filter without the _id and also the opposite (_id without the userID), or none of these filters (in that case i understand the likeNum: -1).

I have 10k documents in that collection if this helps understand that behavior.


Solution

  • You have a condition { _id: "$nin": ["postID1", "postID2"]} which means you will select all but 2 documents. In such case it does not make much sense to use an index, because you would need to read almost the entire index plus almost all data from documents.

    The query is faster when you read just the data and skip the index.