I have post
documents like this :
post = {
"_id": "uid1",
"userID": "user1",
"likeNum": 30,
}
My current pipeline looks like this with user_ids
an array of user IDs to avoid and seen_ids
also an array of post IDs to avoid:
fake_pipeline = [
{'$match': {'userID': {'$nin': user_ids}, '_id': {"$nin": seen_ids}}},
{'$group': {'_id': '$userID', 'posts': {'$push': '$likeNum'}}},
{'$project': {'posts': {'$slice': [{'$sortArray': {'input': '$posts', 'sortBy': {'likeNum': -1}}}, 2]}}},
{'$limit': 10}
]
I would like the aggregation to return a list of 10 post
ordered by likeNum
descending
but with at max 2 post
per userID
.
Example :
post_list_in_db = [
{"_id": "uid1", "userID": "user1", "likeNum": 29},
{"_id": "uid2", "userID": "user1", "likeNum": 2},
{"_id": "uid3", "userID": "user1", "likeNum": 13},
{"_id": "uid4", "userID": "user2", "likeNum": 21},
{"_id": "uid5", "userID": "user2", "likeNum": 19},
{"_id": "uid6", "userID": "user3", "likeNum": 1},
{"_id": "uid7", "userID": "user3", "likeNum": 8},
{"_id": "uid8", "userID": "user3", "likeNum": 14},
{"_id": "uid9", "userID": "user3", "likeNum": 4},
{"_id": "uid10", "userID": "user4", "likeNum": 20},
{"_id": "uid11", "userID": "user4", "likeNum": 9},
{"_id": "uid12", "userID": "user4", "likeNum": 11},
]
The expected output is :
[
{"_id": "uid1", "userID": "user1", "likeNum": 29},
{"_id": "uid4", "userID": "user2", "likeNum": 21},
{"_id": "uid10", "userID": "user4", "likeNum": 20},
{"_id": "uid5", "userID": "user2", "likeNum": 19},
{"_id": "uid8", "userID": "user3", "likeNum": 14},
{"_id": "uid3", "userID": "user1", "likeNum": 13},
{"_id": "uid12", "userID": "user4", "likeNum": 11},
{"_id": "uid7", "userID": "user3", "likeNum": 8},
]
Note : for performance reason, i would like the minimum code in the aggregation
, no need to format the data perfectly like in the example output unless it has no impact on performances, otherwise i prefer reorder / transform objects myself in the code as long as i have the correct post
list content (but not especially formatted or ordered) as output of the aggregation.
You're quite close to what you need.
Sort on $likeNum
after the $match
stage.
Then when you $group
on userID, you can use the $firstN
aggregation accumulator selecting just the first 2 posts per user. So these will be the highest 2 likeNum
's per UserID (since we sorted before this).
"$$ROOT"
to get the actual post document rather than just the likeNum
.Then unwind the accumulated "$posts"
, replace those as the docs themselves, and sort again by likeNum
.
(The parts in blue are the differences from your pipeline.)
db.post.aggregate([
{"$match": {"userID": {"$nin": user_ids}, "_id": {"$nin": seen_ids}}},
{ $sort: { likeNum: -1 } },
{
"$group": {
"_id": "$userID",
"posts": {
$firstN: {
input: "$$ROOT",
n: 2
}
}
}
},
{ $unwind: "$posts" },
{ $replaceWith: "$posts" },
{ $sort: { likeNum: -1 } },
{ "$limit": 10 }
])
Mongo Playground. (No changes to the first $match
stage, I've just put ["user3"]
and ["uid9"]
as values in the playground example.)
Wrt "i prefer reorder / transform objects myself in the code" - The first sort cannot/should not be done in code, since it's for all posts after the match. The second $sort
could be done in code but it's likely to be slower than letting MongoDB sort and then select just 10 results. Otherwise, you need to fetch all results and then sort & limit yourself. And if you plan to have pagination, the skip + limit step have to be done by MongoDB after the sort is also done by MongoDB.