mongodbmongodb-queryaggregation-frameworkspring-data-mongodbspring-data-mongodb-reactive

MongoDB: how to aggregate from multiple collections with same aggregation pipeline


I'm trying to get aggregations with same aggregation pipeline including $match and $group operations from multiple collections.

For example,

with a users collection and collections of questions, answers and comments where every document has authorId and created_at field,

db = [
    'users': [{ _id: 123 }, { _id: 456} ],
    'questions': [
        { authorId: ObjectId('123'), createdAt: ISODate('2022-09-01T00:00:00Z') },
        { authorId: ObjectId('456'), createdAt: ISODate('2022-09-05T00:00:00Z') },
    ],

    'answers': [
        { authorId: ObjectId('123'), createdAt: ISODate('2022-09-05T08:00:00Z') },
        { authorId: ObjectId('456'), createdAt: ISODate('2022-09-01T08:00:00Z') },
    ],
    'comments': [
        { authorId: ObjectId('123'), createdAt: ISODate('2022-09-01T16:00:00Z') },
        { authorId: ObjectId('456'), createdAt: ISODate('2022-09-05T16:00:00Z') },
    ],
]

I want to get counts of documents from each collections with created_at between a given range and grouped by authorId. A desired aggregation result may look like below. The _ids here are ObjectIds of documents in users collection.

\\ match: { createdAt: { $gt: ISODate('2022-09-03T00:00:00Z) } }
[
    { _id: ObjectId('123'), questionCount: 0, answerCount: 1, commentCount: 0 }, 
    { _id: ObjectId('456'), questionCount: 1, answerCount: 0, commentCount: 1 }
]

Currently, I am running aggregation below for each collection, combining the results in the backend service. (I am using Spring Data MongoDB Reactive.) This seems very inefficient.

db.collection.aggregate([
    { $match: { 
        created_at: { $gt: ISODate('2022-09-03T00:00:00Z') }
    }},
    { $group : {
        _id: '$authorId',
        count: {$sum: 1}
    }}
])

How can I get the desired result with one aggregation?

I thought $unionWith or $lookup may help but I'm stuck here.


Solution

  • You can try something like this, using $lookup, here we join users, with all the three collections one-by-one, and then calculate the count:

    db.users.aggregate([
      {
        "$lookup": {
          "from": "questions",
          "let": {
            id: "$_id"
          },
          "pipeline": [
            {
              "$match": {
                $expr: {
                  "$and": [
                    {
                      "$gt": [
                        "$createdAt",
                        ISODate("2022-09-03T00:00:00Z")
                      ]
                    },
                    {
                      "$eq": [
                        "$$id",
                        "$authorId"
                      ]
                    }
                  ]
                }
              }
            }
          ],
          "as": "questions"
        }
      },
      {
        "$lookup": {
          "from": "answers",
          "let": {
            id: "$_id"
          },
          "pipeline": [
            {
              "$match": {
                $expr: {
                  "$and": [
                    {
                      "$gt": [
                        "$createdAt",
                        ISODate("2022-09-03T00:00:00Z")
                      ]
                    },
                    {
                      "$eq": [
                        "$$id",
                        "$authorId"
                      ]
                    }
                  ]
                }
              }
            }
          ],
          "as": "answers"
        }
      },
      {
        "$lookup": {
          "from": "comments",
          "let": {
            id: "$_id"
          },
          "pipeline": [
            {
              "$match": {
                $expr: {
                  "$and": [
                    {
                      "$gt": [
                        "$createdAt",
                        ISODate("2022-09-03T00:00:00Z")
                      ]
                    },
                    {
                      "$eq": [
                        "$$id",
                        "$authorId"
                      ]
                    }
                  ]
                }
              }
            }
          ],
          "as": "comments"
        }
      },
      {
        "$project": {
          "questionCount": {
            "$size": "$questions"
          },
          "answersCount": {
            "$size": "$answers"
          },
          "commentsCount": {
            "$size": "$comments"
          }
        }
      }
    ])
    

    Playground link. In the above query, we use pipelined form of $lookup, to perform join on some custom logic. Learn more about $lookup here.

    Another way is this, perform normal lookup and then filter out the elements:

    db.users.aggregate([
      {
        "$lookup": {
          "from": "questions",
          "localField": "_id",
          "foreignField": "authorId",
          "as": "questions"
        }
      },
      {
        "$lookup": {
          "from": "answers",
          "localField": "_id",
          "foreignField": "authorId",
          "as": "answers"
        }
      },
      {
        "$lookup": {
          "from": "comments",
          "localField": "_id",
          "foreignField": "authorId",
          "as": "comments"
        }
      },
      {
        "$project": {
          questionCount: {
            "$size": {
              "$filter": {
                "input": "$questions",
                "as": "item",
                "cond": {
                  "$gt": [
                    "$$item.createdAt",
                    ISODate("2022-09-03T00:00:00Z")
                  ]
                }
              }
            }
          },
          answerCount: {
            "$size": {
              "$filter": {
                "input": "$answers",
                "as": "item",
                "cond": {
                  "$gt": [
                    "$$item.createdAt",
                    ISODate("2022-09-03T00:00:00Z")
                  ]
                }
              }
            }
          },
          commentsCount: {
            "$size": {
              "$filter": {
                "input": "$comments",
                "as": "item",
                "cond": {
                  "$gt": [
                    "$$item.createdAt",
                    ISODate("2022-09-03T00:00:00Z")
                  ]
                }
              }
            }
          }
        }
      }
    ])
    

    Playground link.