arraysmongodbaggregation-frameworkaggregationmongodb-update

How to use $setUnion on array of object with different object order for updating document


I have this document:

{
  "_id": "65b294740c5693e3deac0cea",
  "name": "Islamic Republic Of Iran",
  "population": 95000000,
  "abb": "IRI",
  "cities": [
    {
      "_id": "65b294740c5693e3deac0cf6",
      "name": "Hamedan",
      "population": 10,
      "abb": "HM"
    },
    {
      "_id": "65b294750c5693e3deac0cf7",
      "name": "Tehran",
      "population": 50,
      "abb": "TH"
    },
    {
      "_id": "65b294750c5693e3deac0cf8",
      "name": "Kerman",
      "population": 12,
      "abb": "KM"
    },
    {
      "_id": "65b294750c5693e3deac0cf9",
      "name": "Esfahan",
      "population": 25,
      "abb": "ES"
    },
    {
      "_id": "65b294750c5693e3deac0cfa",
      "name": "Mashhad",
      "population": 28,
      "abb": "MS"
    }
  ],
}

and I want to update the document and save 5 unique cities on the cities field by the following array:

[
    {
        _id: "65b294750c5693e3deac0cfa",
        name: "Mashhad",
        abb: "MS",
        population: 28
    },
    {
        _id: "65b294750c5693e3deac0cfb",
        name: "Tabriz",
        abb: "TB",
        population: 27
    }
]

If you pay attention, the order of the abb and population fields in this array is not the same as the first array.
I use this update pipeline to update the cities field:

[
  {
    "$set": {
      "cities": {
        "$filter": {
          "input": "$cities",
          "as": "citiesItem",
          "cond": {
            "$ne": [
              "$$citiesItem._id",
              "65b294750c5693e3deac0cf8"
            ]
          }
        }
      }
    }
  },
  {
    "$set": {
      "cities": {
        "$setUnion": [
          "$cities",
          [
            {
              "_id": "65b294750c5693e3deac0cfa",
              "name": "Mashhad",
              "abb": "MS",
              "population": 28
            },
            {
              "_id": "65b294750c5693e3deac0cfb",
              "name": "Tabriz",
              "abb": "TB",
              "population": 27
            }
          ]
        ]
      }
    }
  },
  {
    "$set": {
      "cities": {
        "$sortArray": {
          "input": "$cities",
          "sortBy": {
            "_id": 1
          }
        }
      }
    }
  },
  {
    "$set": {
      "cities": {
        "$slice": [
          "$cities",
          5
        ]
      }
    }
  }
]

I expect this object:

{
    "_id": "65b294750c5693e3deac0cf8",
    "name": "Kerman",
    "population": 12,
    "abb": "KM"
}

to be erased and this object:

{
    _id: "65b294750c5693e3deac0cfb",
    name: "Tabriz",
    abb: "TB",
    population: 27
}

is added to the end of the array of cities.

But due to the different order of the fields, instead of the city of "Tabriz", the city of "Mashhad" is stored twice at the end of the cities field as follows:

{
  "_id": "65b294740c5693e3deac0cea",
  "name": "Islamic Republic Of Iran",
  "population": 95000000,
  "abb": "IRI",
  "cities": [
    {
      "_id": "65b294740c5693e3deac0cf6",
      "name": "Hamedan",
      "population": 10,
      "abb": "HM"
    },
    {
      "_id": "65b294750c5693e3deac0cf7",
      "name": "Tehran",
      "population": 50,
      "abb": "TH"
    },
    {
      "_id": "65b294750c5693e3deac0cf9",
      "name": "Esfahan",
      "population": 25,
      "abb": "ES"
    },
    {
      "_id": "65b294750c5693e3deac0cfa",
      "name": "Mashhad",
      "population": 28,
      "abb": "MS"
    },
    {
      "_id": "65b294750c5693e3deac0cfa",
      "name": "Mashhad",
      "abb": "MS",
      "population": 28
    }
  ],
}

I even tried switching the order of the fields with the pipeline below:

[
  {
    "$set": {
      "cities": {
        "$filter": {
          "input": "$cities",
          "as": "citiesItem",
          "cond": {
            "$ne": [
              "$$citiesItem._id",
              "65b294750c5693e3deac0cf8"
            ]
          }
        }
      }
    }
  },
  {
    "$set": {
      "cities": {
        "$setUnion": [
          {
            "$project": {
              "cities": {
                "$map": {
                  "input": "$cities",
                  "as": "orderedItemcities",
                  "in": {
                    "_id": "$$orderedItemcities._id",
                    "name": "$$orderedItemcities.name",
                    "population": "$$orderedItemcities.population",
                    "abb": "$$orderedItemcities.abb"
                  }
                }
              }
            }
          },
          [
            {
              "_id": "65b2976346bd44878b4ae02b",
              "name": "Mashhad",
              "abb": "MS",
              "population": 28
            },
            {
              "_id": "65b2976346bd44878b4ae02c",
              "name": "Tabriz",
              "abb": "TB",
              "population": 27
            }
          ]
        ]
      }
    }
  },
  {
    "$set": {
      "cities": {
        "$sortArray": {
          "input": "$cities",
          "sortBy": {
            "_id": 1
          }
        }
      }
    }
  },
  {
    "$set": {
      "cities": {
        "$slice": [
          "$cities",
          5
        ]
      }
    }
  }
]

But I got this error:

Invalid $set : : caused by : : Unrecognized expression '$project'

How can I create a proper pipeline for this?
Is there a way to uniquely combine two arrays in MongoDB based on one of the keys of an object?
For example, should we perform this combination according to the value of the _id key?


Solution

  • Based on your second query, what did I fix:

    1. Remove the $project and cities as not needed.

    2. The position of the abb field should come first before the population.

    3. The "Mashhad" _id should be "65b294750c5693e3deac0cfa" according to the document.

    db.collection.update({},
    [
      {
        "$set": {
          "cities": {
            "$filter": {
              "input": "$cities",
              "as": "citiesItem",
              "cond": {
                "$ne": [
                  "$$citiesItem._id",
                  "65b294750c5693e3deac0cf8"
                ]
              }
            }
          }
        }
      },
      {
        "$set": {
          "cities": {
            "$setUnion": [
              {
                "$map": {
                  "input": "$cities",
                  "as": "orderedItemcities",
                  "in": {
                    "_id": "$$orderedItemcities._id",
                    "name": "$$orderedItemcities.name",
                    "abb": "$$orderedItemcities.abb",
                    "population": "$$orderedItemcities.population"
                  }
                }
              },
              [
                {
                  "_id": "65b294750c5693e3deac0cfa",
                  "name": "Mashhad",
                  "abb": "MS",
                  "population": 28
                },
                {
                  "_id": "65b2976346bd44878b4ae02c",
                  "name": "Tabriz",
                  "abb": "TB",
                  "population": 27
                }
              ]
            ]
          }
        }
      },
      {
        "$set": {
          "cities": {
            "$sortArray": {
              "input": "$cities",
              "sortBy": {
                "_id": 1
              }
            }
          }
        }
      },
      {
        "$set": {
          "cities": {
            "$slice": [
              "$cities",
              5
            ]
          }
        }
      }
    ])
    

    Demo Solution 1 @ Mongo Playground

    If you are looking alternative to check the _id from the provided input array that exists in the current document and only add when it is not exist, you can use the below query:

    db.collection.update({},
    [
      {
        "$set": {
          "cities": {
            "$filter": {
              "input": "$cities",
              "as": "citiesItem",
              "cond": {
                "$ne": [
                  "$$citiesItem._id",
                  "65b294750c5693e3deac0cf8"
                ]
              }
            }
          }
        }
      },
      {
        "$set": {
          "cities": {
            $reduce: {
              input: [
                {
                  "_id": "65b294750c5693e3deac0cfa",
                  "name": "Mashhad",
                  "abb": "MS",
                  "population": 28
                },
                {
                  "_id": "65b294750c5693e3deac0cfb",
                  "name": "Tabriz",
                  "abb": "TB",
                  "population": 27
                }
              ],
              initialValue: "$cities",
              in: {
                $concatArrays: [
                  "$$value",
                  {
                    $ifNull: [
                      {
                        $cond: {
                          if: {
                            $not: {
                              $in: [
                                "$$this._id",
                                "$cities._id"
                              ]
                            }
                          },
                          then: [
                            "$$this"
                          ],
                          else: null
                        }
                      },
                      []
                    ]
                  }
                ]
              }
            }
          }
        }
      },
      {
        "$set": {
          "cities": {
            "$sortArray": {
              "input": "$cities",
              "sortBy": {
                "_id": 1
              }
            }
          }
        }
      },
      {
        "$set": {
          "cities": {
            "$slice": [
              "$cities",
              5
            ]
          }
        }
      }
    ])
    

    Demo Solution 2 @ Mongo Playground