mongodbduplicatesaggregator

Find Duplicate In MongoDB While Comparing Some Field In those Duplicate Items


I Have a DB

[
  {
    "_id": 1,
    "email": "amrit@gmail.com",
    "status": "ACTIVE"
  },
  {
    "_id": 2,
    "email": "abc@gmail.com",
    "status": "INACTIVE"
  },
  {
    "_id": 3,
    "email": "tut@gmail.com",
    "status": "ACTIVE"
  },
  {
    "_id": 4,
    "email": "amrit@gmail.com",
    "status": "INACTIVE"
  },
  {
    "_id": 5,
    "email": "tut@gmail.com",
    "status": "ACTIVE"
  },
  {
    "_id": 6,
    "email": "cat@gmail.com",
    "status": "ACTIVE"
  },
  
]

Now I want to find the item according to emails, which have status as both ACTIVE and INACTIVE. I have written the query to find duplicates like this.

db.getCollection(‘employees’).aggregate([
    {$group: {
        _id: {email: “$email”},
        uniqueIds: {$addToSet: “$_id”},
        count: {$sum: 1}
        }
    },
    {$match: {
        count: {“$gt”: 1}
        }
    }
], {allowDiskUse:true });

This return both tut@gmail.com and amrit@gmail.com but I only want amrit@gmail.com as it as both ACTIVE and INACTIVE in db. Result should look like

{
    "_id": {
      "email": "amrit@gmail.com"
    },
    "uniqueIds": [
      4,
      1
    ]
  }

Solution

  • Try the below query.

    db.getCollection("employees").aggregate([
      {
        $group: {
          _id: {
            email: "$email"
          },
          uniqueIds: {
            $addToSet: "$_id"
          },
          status: {
            $addToSet: "$status"
          }
        }
      },
      {
        $match: {
          status: {
            "$all": [
              "ACTIVE",
              "INACTIVE"
            ]
          }
        }
      },
      {
        $project: {
          status: 0
        }
      }
    ])
    

    Here is MongoPlayground for you.