mongodbpymongomql

MongoDB MQL, split list in two and get only unique values


I'm having a books collection in MongoDB.

Each has a categories list up to 2 entries, example :

categories: [
   'Thriller'
]

or

categories: [
   'Adventure',
   'Action'
]

Using MongoDB MQL I need to get two categories lists filtered to project :

I can only use $group, $addToSet and $arrayElemAt (no $unwind).

Here where I am and I can't find the way to do it :

collection.aggregate(
    [
        {"$group":
            {
                "_id": "$_id",
                "categories1" : { "$addToSet": { "$arrayElemAt": [ "$categories", 0 ] } },
                "categories2" : { "$addToSet": { "$arrayElemAt": [ "$categories", 1 ] } }
            }
        }
    ]
)

Example of entry in books collection :

{
    _id: 2,
    title: 'HARRY POTTER A L'ECOLE DES SORCIERS - ILLUSTRE PAR MINALIMA',
    isbn: '2075145938',
    pageCount: 368,
    publishedDate: ISODate('2020-10-22T08:00:00.000Z'),
    shortDescription: 'Découvrez ou redécouvrez le texte intégral...',
    status: 'PUBLISH',
    authors: [
        'J.K. Rowling',
        'Minalima'
    ],
    categories: [
        'Youth',
        'Adventure'
    ]
}

Expected output :

{
    categories1 : [
        'Youth',
        'Thriller',
        'Newspaper'],
    categories2 : [
        'Adventure',
        'Newspaper',
        'Essai'],

}

categories1 include only unique values from categories with 0 index (first value) and categories2 include only unique values from categories but this time in second position (index 1).

Any idea ?

Thanks!


Solution

  • Your query is almost correct. You only need to change the _id for $group. Since you want the unique lists across all books, don't group on $_id - that would treat each object/book individually. Use null to group across all books in the collection:

    db.collection.aggregate(
        [
            {"$group":
                {
                    "_id": null,  // this is the only line I changed
                    "categories1" : { "$addToSet": { "$arrayElemAt": [ "$categories", 0 ] } },
                    "categories2" : { "$addToSet": { "$arrayElemAt": [ "$categories", 1 ] } }
                }
            },
            { "$project": { "_id": 0 } }
        ]
    )
    

    And since you don't want _id: null in the result, I've added a second stage $project in the pipeline.

    Btw, if you're putting that aggregation pipeline directly in Python, change the "_id": null to "_id": None

    Mongo Playground

    example data:

    [
      {
        title: "book one title",
        categories: [ "Thriller" ]
      },
      {
        title: "book TWO title",
        categories: [ "Adventure", "Action" ]
      },
      {
        title: "book 3 title",
        categories: [ "Action", "Musical", "Animated" ]
      },
      {
        title: "fourth book",
        categories: [ "Thriller", "Musical" ] // both of these are duplicate categories
      }
    ]
    

    Output:

    [
      {
        "categories1": [ "Thriller", "Adventure", "Action" ],
        "categories2": [ "Musical", "Action" ]
      }
    ]