mongodbpymongo

Write a single query from multiple queries in MongoDB


I have 2 collections users and files. A user document looks like this :

{
  "_id": "user1",
  "name": "John"
}

A file document looks like this :

{
  "_id": "fileID1",
  "url": "any_url",
  "userID": "user1"
}

I have a list of user IDs let's say : ["user1", "user2", "user5"] and i want to perform a single query that takes this list as input and returns 5 files of each user ID.

For now i have a naive logic that is not really performant :

result_users = users_collection.find({}).limit(15)
  for user in result_users:
    user_id.append(user["_id"])
    result_files = files_collection.find({"userID": user["_id"]}).limit(5)
      for f in result_files:
        file_dict[user["_id"]].append(f)

The query i am looking for would naively be something like : result_files = files_collection.find({"userID": {"$in / $each": user_ids_list}).limit(5)

The output i am looking for is :

{
  "user1": ["fileID1", "fileID2", "fileID8", "fileID3"],
  "user2": ["fileID4", "fileID5", "fileID6", "fileID9", "fileID7"],
  ...
}

With at max 5 files per userID.

I am pretty sure this is achievable with aggregation but i have no clue how to write it correctly.

Note : i don't really care about the output format, i only need a list of file that is sorted correctly with 5 files per userID.

I can map the output myself when i am confident the query returns the right files.

Thanks !

Edit : it would be amazing to add a sort("at", DESCENDING) on the files collection aggregation


Solution

  • since you already have the user names in the file docs you won't need to query user collection.

    1. First narrow down the files using a $match based on existence of user in input array
    2. Group by the userID field and push the file id and the timestamp (will need later for sorting)
    3. Now call $sortArray on the grouped file array (v). Now it is sorted in descending(-1). On that sorted call $slice and finally map over the sliced array to get an array of just the ids. this will now generate in the format [ { _id:"user1",v: [...]}, { _id:"user2",v: [...]}... ]
    db.files.aggregate([
      {
        $match: { userID: { $in: ["user1", "user2", "user5"] } }
      },
      {
        $group: {
          _id: "$userID",
          v: { $push: { id: "$_id", at: "$at" } }
        }
      },
      {
        $project: {
          v: {
            $map: {
              input: { $slice: [{ $sortArray: { input: "$v", sortBy: { at: -1 } } }, 5] },
              in: "$$this.id"
            }
          }
        }
      }
    ])
    
    

    playground

    if you want to further convert it to { "user1": [...], "user2": [...], ... } format you can add these 2 stages

      {
        $group: {
          _id: null,
          kv: { $push: { k: "$_id", v: "$v" } }
        }
      },
      {
        $replaceRoot: { newRoot: { $arrayToObject: "$kv" } }
      }
    
    

    playground