node.jsmongodbmongodb-queryrandom-data

Fetching Random MongoDB Documents Equally Across Multiple Chapters


I have a MongoDB collection with documents structured like this:

{
    "index": 23,
    "chapter": "b11"
},
{
    "index": 25,
    "chapter": "b11"
},
{
    "index": 26,
    "chapter": "b14"
},
{
    "index": 27,
    "chapter": "b14"
},
{
    "index": 28,
    "chapter": "b16"
}

In my collection, I have documents with an "index" field and a "chapter" field representing different chapter names.

I need to fetch a random set of documents (e.g., 20 random documents) based on specific chapter names like "b11" and "b16." The number of chapters I query may vary (e.g., I might need to fetch for three chapters like "b11," "b12," "m14"). What's crucial is that I need an equal number of random documents for each of the specified chapter names.

I understand that the total number of documents I need to retrieve should be divisible by the number of chapters I'm querying.

What's the most efficient way to achieve this using MongoDB and Node.js as my backend? Any code examples or insights would be greatly appreciated. Thank you!

I have attempted to fetch random documents from the MongoDB collection using a basic query, but this doesn't guarantee an equal distribution of documents across the specified chapter names. I was expecting to retrieve a balanced number of documents for each chapter name, but the results were uneven. I'm now looking for guidance on how to implement a solution that ensures an equal distribution of random documents for the provided chapter names.


Solution

  • I was thinking to use $sample since @Fourchette's comment made sense to me. However, I found out that $sample cannot take in a variable as argurment. (See this)

    So I have to do followings in aggregation pipeline:

    1. $match only the chapters you want
    2. get distinct set and count for matched chapters
    3. $divide your target number of chapter evenly across the fetched chapters
    4. perform self $lookup. Use $setWindowFields to compute $rank within a chapter and use $rand as a tiebreaker
    5. pick only those with rank <= num per group. e.g. if you want 10 documents in total and there are 2 matched chapters, pick those with rank <= 10 / 2 = 5
    db.collection.aggregate([
      {
        "$match": {
          chapter: {
            $in: [
              "b11",
              "b14"
            ]
          }
        }
      },
      {
        $group: {
          _id: null,
          chapter: {
            $addToSet: "$chapter"
          }
        }
      },
      {
        $set: {
          numPerChapter: {
            "$divide": [
              10,
              {
                $size: "$chapter"
              }
            ]
          }
        }
      },
      {
        "$unwind": "$chapter"
      },
      {
        "$lookup": {
          "from": "collection",
          "let": {
            numPerChapter: "$numPerChapter"
          },
          "localField": "chapter",
          "foreignField": "chapter",
          "pipeline": [
            {
              $set: {
                randKey: {
                  "$rand": {}
                }
              }
            },
            {
              "$setWindowFields": {
                "sortBy": {
                  "randKey": 1
                },
                "output": {
                  "rank": {
                    $rank: {}
                  }
                }
              }
            },
            {
              "$match": {
                $expr: {
                  $lte: [
                    "$rank",
                    "$$numPerChapter"
                  ]
                }
              }
            },
            {
              "$unset": [
                "randKey",
                "rank"
              ]
            }
          ],
          "as": "picked"
        }
      },
      {
        "$unwind": "$picked"
      },
      {
        "$replaceRoot": {
          "newRoot": "$picked"
        }
      }
    ])
    

    Mongo Playground