I'm trying to get aggregations with same aggregation pipeline including $match
and $group
operations from multiple collections.
For example,
with a users
collection and collections of questions
, answers
and comments
where every document has authorId
and created_at
field,
db = [
'users': [{ _id: 123 }, { _id: 456} ],
'questions': [
{ authorId: ObjectId('123'), createdAt: ISODate('2022-09-01T00:00:00Z') },
{ authorId: ObjectId('456'), createdAt: ISODate('2022-09-05T00:00:00Z') },
],
'answers': [
{ authorId: ObjectId('123'), createdAt: ISODate('2022-09-05T08:00:00Z') },
{ authorId: ObjectId('456'), createdAt: ISODate('2022-09-01T08:00:00Z') },
],
'comments': [
{ authorId: ObjectId('123'), createdAt: ISODate('2022-09-01T16:00:00Z') },
{ authorId: ObjectId('456'), createdAt: ISODate('2022-09-05T16:00:00Z') },
],
]
I want to get counts of documents from each collections with created_at
between a given range and grouped by authorId
.
A desired aggregation result may look like below. The _id
s here are ObjectIds of documents in users
collection.
\\ match: { createdAt: { $gt: ISODate('2022-09-03T00:00:00Z) } }
[
{ _id: ObjectId('123'), questionCount: 0, answerCount: 1, commentCount: 0 },
{ _id: ObjectId('456'), questionCount: 1, answerCount: 0, commentCount: 1 }
]
Currently, I am running aggregation below for each collection, combining the results in the backend service. (I am using Spring Data MongoDB Reactive.) This seems very inefficient.
db.collection.aggregate([
{ $match: {
created_at: { $gt: ISODate('2022-09-03T00:00:00Z') }
}},
{ $group : {
_id: '$authorId',
count: {$sum: 1}
}}
])
How can I get the desired result with one aggregation?
I thought $unionWith
or $lookup
may help but I'm stuck here.
You can try something like this, using $lookup, here we join users, with all the three collections one-by-one, and then calculate the count:
db.users.aggregate([
{
"$lookup": {
"from": "questions",
"let": {
id: "$_id"
},
"pipeline": [
{
"$match": {
$expr: {
"$and": [
{
"$gt": [
"$createdAt",
ISODate("2022-09-03T00:00:00Z")
]
},
{
"$eq": [
"$$id",
"$authorId"
]
}
]
}
}
}
],
"as": "questions"
}
},
{
"$lookup": {
"from": "answers",
"let": {
id: "$_id"
},
"pipeline": [
{
"$match": {
$expr: {
"$and": [
{
"$gt": [
"$createdAt",
ISODate("2022-09-03T00:00:00Z")
]
},
{
"$eq": [
"$$id",
"$authorId"
]
}
]
}
}
}
],
"as": "answers"
}
},
{
"$lookup": {
"from": "comments",
"let": {
id: "$_id"
},
"pipeline": [
{
"$match": {
$expr: {
"$and": [
{
"$gt": [
"$createdAt",
ISODate("2022-09-03T00:00:00Z")
]
},
{
"$eq": [
"$$id",
"$authorId"
]
}
]
}
}
}
],
"as": "comments"
}
},
{
"$project": {
"questionCount": {
"$size": "$questions"
},
"answersCount": {
"$size": "$answers"
},
"commentsCount": {
"$size": "$comments"
}
}
}
])
Playground link. In the above query, we use pipelined form of $lookup, to perform join on some custom logic. Learn more about $lookup here.
Another way is this, perform normal lookup and then filter out the elements:
db.users.aggregate([
{
"$lookup": {
"from": "questions",
"localField": "_id",
"foreignField": "authorId",
"as": "questions"
}
},
{
"$lookup": {
"from": "answers",
"localField": "_id",
"foreignField": "authorId",
"as": "answers"
}
},
{
"$lookup": {
"from": "comments",
"localField": "_id",
"foreignField": "authorId",
"as": "comments"
}
},
{
"$project": {
questionCount: {
"$size": {
"$filter": {
"input": "$questions",
"as": "item",
"cond": {
"$gt": [
"$$item.createdAt",
ISODate("2022-09-03T00:00:00Z")
]
}
}
}
},
answerCount: {
"$size": {
"$filter": {
"input": "$answers",
"as": "item",
"cond": {
"$gt": [
"$$item.createdAt",
ISODate("2022-09-03T00:00:00Z")
]
}
}
}
},
commentsCount: {
"$size": {
"$filter": {
"input": "$comments",
"as": "item",
"cond": {
"$gt": [
"$$item.createdAt",
ISODate("2022-09-03T00:00:00Z")
]
}
}
}
}
}
}
])
Playground link.