I have a MongoDB collection with documents structured like this:
{
"index": 23,
"chapter": "b11"
},
{
"index": 25,
"chapter": "b11"
},
{
"index": 26,
"chapter": "b14"
},
{
"index": 27,
"chapter": "b14"
},
{
"index": 28,
"chapter": "b16"
}
In my collection, I have documents with an "index" field and a "chapter" field representing different chapter names.
I need to fetch a random set of documents (e.g., 20 random documents) based on specific chapter names like "b11" and "b16." The number of chapters I query may vary (e.g., I might need to fetch for three chapters like "b11," "b12," "m14"). What's crucial is that I need an equal number of random documents for each of the specified chapter names.
I understand that the total number of documents I need to retrieve should be divisible by the number of chapters I'm querying.
What's the most efficient way to achieve this using MongoDB and Node.js as my backend? Any code examples or insights would be greatly appreciated. Thank you!
I have attempted to fetch random documents from the MongoDB collection using a basic query, but this doesn't guarantee an equal distribution of documents across the specified chapter names. I was expecting to retrieve a balanced number of documents for each chapter name, but the results were uneven. I'm now looking for guidance on how to implement a solution that ensures an equal distribution of random documents for the provided chapter names.
I was thinking to use $sample
since @Fourchette's comment made sense to me. However, I found out that $sample
cannot take in a variable as argurment. (See this)
So I have to do followings in aggregation pipeline:
$match
only the chapters you want$divide
your target number of chapter evenly across the fetched chapters$lookup
. Use $setWindowFields
to compute $rank
within a chapter and use $rand
as a tiebreakerdb.collection.aggregate([
{
"$match": {
chapter: {
$in: [
"b11",
"b14"
]
}
}
},
{
$group: {
_id: null,
chapter: {
$addToSet: "$chapter"
}
}
},
{
$set: {
numPerChapter: {
"$divide": [
10,
{
$size: "$chapter"
}
]
}
}
},
{
"$unwind": "$chapter"
},
{
"$lookup": {
"from": "collection",
"let": {
numPerChapter: "$numPerChapter"
},
"localField": "chapter",
"foreignField": "chapter",
"pipeline": [
{
$set: {
randKey: {
"$rand": {}
}
}
},
{
"$setWindowFields": {
"sortBy": {
"randKey": 1
},
"output": {
"rank": {
$rank: {}
}
}
}
},
{
"$match": {
$expr: {
$lte: [
"$rank",
"$$numPerChapter"
]
}
}
},
{
"$unset": [
"randKey",
"rank"
]
}
],
"as": "picked"
}
},
{
"$unwind": "$picked"
},
{
"$replaceRoot": {
"newRoot": "$picked"
}
}
])