pythondjangonatural-sort

Natural sort on Django Queryset


I am working on a system that lists out a range of products sorted by their product code. The product codes are made up of two letters for the followed by a number, for example EG1.

I currently sort these products by doing a simple

Product.objects.order_by('product_code'),

however as there can be multiple digit product codes (for example EG12), these will come out above ahead of the single digit codes. i.e EG1, EG11, EG12, EG13 ... EG19, EG2, EG20 etc

I know that adding leading zeros to the product codes will fix this (i.e EG01 rather than EG1) but as there is already printed literature and an existing site using EG1 this is not an option.

Is there a way to fix this to show these products in the correct order?


Solution

  • I think the implementation here (https://github.com/nathforge/django-naturalsortfield) should work. The main advantage of this method is that it doesn't do the sorting in python but in the database so it'll perform well even on large datasets, at the cost of some additional storage.

    You have to change your model to include a product_code__sort field

    class MyModel(models.Model):
        title = models.CharField(max_length=255)
        title_sort = NaturalSortField('title')
    

    where the NaturalSortField is defined as

    class NaturalSortField(models.CharField):
        def __init__(self, for_field, **kwargs):
            self.for_field = for_field
            kwargs.setdefault('db_index', True)
            kwargs.setdefault('editable', False)
            kwargs.setdefault('max_length', 255)
            super(NaturalSortField, self).__init__(**kwargs)
    
        def pre_save(self, model_instance, add):
            return self.naturalize(getattr(model_instance, self.for_field))
    
        def naturalize(self, string):
            def naturalize_int_match(match):
                return '%08d' % (int(match.group(0)),)
    
            string = string.lower()
            string = string.strip()
            string = re.sub(r'^the\s+', '', string)
            string = re.sub(r'\d+', naturalize_int_match, string)
    
            return string