I have the following post
document :
post = {"_id": "postID1", "userID": "userID1", "likeNum": 10, "url": "anyUrl"}
And i have a query :
posts_collection.find({"userID": {"$nin": ["uid1", "uid2"]}, "_id": {"$nin": ["postID1", "postID2"]}}).sort("likeNum", DESCENDING).limit(30).explain()
And my aggregation pipeline :
pipeline = [
{"$match":{"userID": {"$nin": ["uid1", "uid2"]}, "_id": {"$nin": ["postID1", "postID2"]}}},
{"$sort": {"likeNum": -1}},
{"$group": {"_id": "$userID", "posts": {"$firstN": {"input": "$$ROOT", "n": 2}}}},
{"$unwind": "$posts"},
{"$replaceWith": "$posts"},
{"$sort": {"likeNum": -1}},
{"$limit": 30}
]
posts_collection.aggregate(pipeline)
I have built an index likeNum: -1
and another compound index userID: 1, _id: 1, likeNum: -1
The explain()
method always shows that it goes for likeNum: -1
and performs a collection scan.
And my aggregation pipeline explained
output is also showing the same info in my Query Profiler in my MongoDB Atlas dashboard.
Is it because the sort in memory of $nin
is less performant than the collection scan ? Or any other reason ?
Note : Sometimes the query may have only the userID
filter without the _id
and also the opposite (_id
without the userID
), or none of these filters (in that case i understand the likeNum: -1
).
I have 10k documents in that collection if this helps understand that behavior.
You have a condition { _id: "$nin": ["postID1", "postID2"]}
which means you will select all but 2 documents. In such case it does not make much sense to use an index, because you would need to read almost the entire index plus almost all data from documents.
The query is faster when you read just the data and skip the index.