pythonmongodbpymongomongomock

Update records with new objects


Say I have the following MongoDB collection (am using mongomock for this example so it's easy to reproduce):

import mongomock

collection = mongomock.MongoClient().db.collection

objects = [{'name': 'Alice', 'age': 21}, {'name': 'Bob', 'age': 20}]
collection.insert_many(objects)

I then would like to update my existing objects with the fields from some new objects:

new_objects = [{'name': 'Alice', 'height': 170}, {'name': 'Caroline', 'height': 160}]

The only way I could think of doing this is:

for record in new_objects:
    if collection.find_one({'name': record['name']}) is not None:
        collection.update_one({'name': record['name']}, {'$set': {'height': record['height']}})
    else:
        collection.insert_one(record)

However, if new_objects is very large, then this method becomes slow - is there a way to use update_many for this?


Solution

  • You can't use update_many(), because it requires a single filter which in your use case would not work as each filter is different.

    A simpler construct uses upsert=True to avoid the insert/update logic, and also sets all the fields specified in the record which is less coding :

    for record in objects + new_objects:
        collection.update_one({'name': record.get('name')}, {'$set': record}, upsert=True)
    

    If it is slowing down with a larger number of updates, make sure you have an index on the name field using (in mongo shell):

    db.collection.createIndex( { "name": 1 } )
    

    You can squeeze a bit more performance out by using a bulk_write operation. Worked example:

    from pymongo import MongoClient, UpdateOne
    
    collection = MongoClient().db.collection
    
    objects = [{'name': 'Alice', 'age': 21}, {'name': 'Bob', 'age': 20}]
    new_objects = [{'name': 'Alice', 'height': 170}, {'name': 'Caroline', 'height': 160}]
    
    updates = []
    
    for record in objects + new_objects:
        updates.append(UpdateOne({'name': record.get('name')}, {'$set': record}, upsert=True))
    
    collection.bulk_write(updates)
    
    for record in collection.find({}, {'_id': 0}):
        print(record)
    

    Gives:

    {'name': 'Alice', 'age': 21, 'height': 170}
    {'name': 'Bob', 'age': 20}
    {'name': 'Caroline', 'height': 160}