djangocountcharacterwords

Word count query in Django


Given a model with both Boolean and TextField fields, I want to do a query that finds records that match some criteria AND have more than "n" words in the TextField. Is this possible? e..g.:

class Item(models.Model):

    ...    
    notes = models.TextField(blank=True,)
    has_media = models.BooleanField(default=False)
    completed = models.BooleanField(default=False)
    ...

This is easy:

items = Item.objects.filter(completed=True,has_media=True)

but how can I filter for a subset of those records where the "notes" field has more than, say, 25 words?


Solution

  • Try this:

    Item.objects.extra(where=["LENGTH(notes) - LENGTH(REPLACE(notes, ' ', ''))+1 > %s"], params=[25])
    

    This code uses Django's extra queryset method to add a custom WHERE clause. The calculation in the WHERE clause basically counts the occurances of the "space" character, assuming that all words are prefixed by exactly one space character. Adding one to the result accounts for the first word.

    Of course, this calculation is only an approximation to the real word count, so if it has to be precise, I'd do the word count in Python.