TLDR: how can I prevent very large IN sets in the SQL query generated?
When I pre-fetch a ManyToMany field pointing to model Tag from a model Object:
obj_qs = models.Object.objects.filter(created_time__gt = 2024)
obj_qs.prefetch_related('tags')
I get a pre-fetching SQL query of the following form:
SELECT (`main_object_tag`.`object_id`) AS `_prefetch_related_val_object_id`, `main_tag`.`id`, `main_tag`.`name` FROM `main_tag` INNER JOIN `main_object_tag` ON (`main_tag`.`id` = `main_object_tag`.`tag_id`) WHERE `main_object_tag`.`object_id` IN (1, 2, 3, 4, 5); args=(1, 2, 3, 4, 5)
The problem is, the IN clause could be huge this way - hundreds of thousands of objects in my case. Could I work around this somehow - using a query instead of the "IN" specification?
Extra: why?
We would serialize the objects in the following way subsequently:
return JsonResponse([
{
'name': obj.name,
'tags': [tag.name for tag in obj.tags.all()]
} for obj in obj_qs
])
If the amount is too large, you can work with chunks, and thus fetch the queryset in chunks of ~10'000 elements for example.
You can use the .iterator(…)
[Django-doc] for this:
return JsonResponse(
[
{'name': obj.name, 'tags': [tag.name for tag in obj.tags.all()]}
for obj in obj_qs.iterator(10000)
]
)
another idea is to limit the amount of data transferred, by only fetching the relevant data from the object and the tag, in this case the .name
by using .only(…)
[Django-doc]:
from django.db.models import Prefetch
obj_qs = models.Object.objects.filter(created_time__gt=2024).only('name')
obj_qs.prefetch_related(Prefetch('tags', Tag.objects.only('pk', 'name')))