So, I have a Collection of documents (e.g. Person
) structured in this way:
class Person(Document):
name = StringField(max_length=200, required=True)
nationality = StringField(max_length=200, required=True)
earning = ListField(IntField())
when I save the document I only input the name
and nationality
fields because this is the information.
Then, every now and then, I want to update the the earning of each person of a particular nationality. Let's imagine that there is some formula that allows me to compute the earning field (e.g. I query some magical api called EarningAPI
that returns the earning
of a person given its name
).
To update them I would do something like:
japanese_people = Person.objects(Q(nationality='Japanese'))).all()
for japanese_person in japanese_people:
japanese_person.earning.append(EarningAPI(japanese_person.name))
Person.objects.insert(japanese_people, load_bulk=False)
The EarningAPI has also the possibility to work in batches, so that i can give a list of names and it returns a list of earning(s) (one for each name). This method is far faster and less expensive.
Is the one by one way correct? What is the best way to take advantage of the batches?
Thanks
Using method from Mongoengine bulk update without objects.update():
from pymongo import UpdateOne
from mongoengine import Document, ValidationError
class Person(Document):
name = StringField(max_length=200, required=True)
nationality = StringField(max_length=200, required=True)
earning = ListField(IntField())
japanese_people = Person.objects(Q(nationality='Japanese')).all()
japanese_ids = [person.id for person in japanese_people]
earnings = EarningAPI(japanese_ids)
# I'm assuming it takes a list of id's as input and returns a list of earnings.
bulk_operatons = [
UpdateOne(
{'_id': j_id},
{'$set': {'earning': earn}},
upsert=True
),
for j_id, earn in zip(japanese_ids, earnings)
]
result = Person._get_collection().bulk_write(bulk_operations, ordered=False)
I can't be certain if this is faster than the one by one method because I don't have access to your magic API to benchmark, but this should be the way to do it by batch.