I'm working on a Python app that uses MongoDB (with pymongo for connection) and need to avoid race conditions when adding documents to a collection. Unfortunately, I can't use transactions because I don't control the database and thus can't set up replica sets.
From my understanding, operations like find_one_and_update() or update() are atomic (two concurrent operations cannot update the same document in a conflicting way), but I’m unsure about atomicity when using update() with upsert=True.
Here's my use case: I have a collection of tasks, where each task document contains a list of device IDs in a devices field. When inserting a new task, I need to ensure that no other task in the collection has any of the same devices listed. My current approach is to use the following code:
task = {'devices': [1, 2, 3], 'name': 'my_new_task'}
query = {"devices": {'$elemMatch': {'$in': task['devices']}}}
result = collection.update_one(query, {'$setOnInsert': task}, upsert=True)
if not result.upserted_id:
print('Task was not upserted as there are other tasks with the same devices')
The idea is to insert the task only when no other task has any of the same devices. However, I suspect that this operation is not atomic, and there’s a chance for race conditions if multiple concurrent requests are made. Specifically, the query and insertion seem to happen in two steps, potentially allowing conflicting tasks to be inserted in parallel.
Am I correct that update_one() with upsert=True is not atomic in this case? If so, how can I ensure atomicity or prevent race conditions when adding tasks with device lists? (For me it is important to guarantee that two parallel queries won't be able to insert 2 tasks with devices lists that have instersections.) Any advice or alternative approaches would be greatly appreciated!
I overlooked the idea that unique index on arrays in mongo db compares not arrays as a whole, but their elements (thanks @joe for the tip). As the docs say that
Upserts can create duplicate documents, unless there is a unique index to prevent duplicates.
Thus the best approach to prevent race condition is to create a unique index. In my case I used partial index to enforce uniquenes on 'devices' with entries that also has 'status' = 'Active'. It prevents inserts and updates that could have lead to conflicts. I used this snippet to set up the index:
collection.create_index(
"devices",
unique=True,
partialFilterExpression={"status": "ACTIVE"}
)
But would like to point out that unique index won't prevent from inserting duplicates in the indexed field. In my case it will allow to insert task with devices = [1, 1, 2], you should keep that in mind.