I have a Story model with a M2M relationship to some Resource objects. Some of the Resource objects are missing a name so I want to copy the title of the Story to the assigned Resource objects.
Here is my code:
from collector import models
from django.core.paginator import Paginator
paginator = Paginator(models.Story.objects.all(), 1000)
def fix_issues():
for page in range(1, paginator.num_pages + 1):
for story in paginator.page(page).object_list:
name_story = story.title
for r in story.resources.select_subclasses():
if r.name != name_story:
r.name = name_story
r.save()
if len(r.name) == 0:
print("Something went wrong: " + name_story)
print("done processing page %s out of %s" % (page, paginator.num_pages))
fix_issues()
I need to use a paginator because I'm dealing with a million objects. The weird part is that after calling fix_issues() about half of my resources that had no name, now have the correct name, while the other half still has no name. I can call fix_issues() again and again and every time more objects receive a name. This seems really weird to me, why would an object not be updated the first time but only the second time?
Additional information:
My Model:
class Resource(models.Model):
name = models.CharField(max_length=200)
# Important for retrieving the correct subtype.
objects = InheritanceManager()
def __str__(self):
return str(self.name)
class CustomResource(Resource):
homepage = models.CharField(max_length=3000, default="", blank=True, null=True)
class Story(models.Model):
url = models.URLField(max_length=3000)
resources = models.ManyToManyField(Resource)
popularity = models.FloatField()
def _update_popularity(self):
self.popularity = 3
def save(self, *args, **kwargs):
super(Story, self).save(*args, **kwargs)
self._update_popularity()
super(Story, self).save(*args, **kwargs)
Documentation for the select_subclasses: http://django-model-utils.readthedocs.io/en/latest/managers.html#inheritancemanager
Further investigation: I thought that maybe select_subclasses did not return all the objects. Right now every story has exactly one resource. So it was easy enough to check that select_subclasses always returns one item. This is the function I used:
def find_issues():
for page in range(1, paginator.num_pages + 1):
for story in paginator.page(page).object_list:
assert(len(story.resources.select_subclasses()) == 1)
print("done processing page %s out of %s" % (page, paginator.num_pages))
But again, this executes without any problems. So I don't thing the select_subclasses is to blame. I also checked if paginator.num_pages is right and it is. If i divide by 1000 (items per page) I get exactly the number of stories I have in my database.
I think I know what is happening:
The Paginator loads a Queryset and gives me the first n items. I process these and update some of the values. But for the next iteration the order of the items in the queryset changes (because I updated some of them and did not define an order). So I'm skipping over items that are now on the first page. I can avoid it by specifying an order (pk for example).
If you think I'm wrong, please let me know. Otherwise I will accept this as the correct answer. Thank you.