djangopostgresqldjango-annotatedjango-jsonfield

Django annotate count in JSONField with Postgres


Using Django I have a field which is of type JSONField. I am wanting to get a distinct count on a nested key/value in the json. With a normal field you can just do soemthing like the following

model.objects.values('field_name')\
.annotate(total=Count('field_name')).order_by('-total')

This does not work for a JSONField.

Example Model

class Pet(models.Model):
    data = JSONField()

Example of data

  {
    'name':'sparky',
    'animal':'dog',
    'diet':{
        'breakfast':'biscuits',
        'dinner':'meat',
    }
}

Trying

Pet.objects.values('data__diet__dinner')\
.annotate(total=Count('data__diet__dinner')).order_by('-total')

Exception of

TypeError: unhashable type: 'list'

What is the correct way to execute this?


Solution

  • You can use jsonb_extract_path_text via a Func object as an alternative to the field transform:

    from django.db.models.aggregates import Count
    from django.db.models.expressions import F, Func, Value
    
    pets = (
        Pet.objects.annotate(
            dinner=Func(
                F("data"),
                Value("diet"),
                Value("dinner"),
                function="jsonb_extract_path_text",
            )
        )
        .values("dinner")
        .annotate(total=Count("dinner"))
    )
    

    The reason why the field transform data__diet__dinner fails is an error within Django when you go deeper than just one level into the json structure and use GROUP BY in the SQL. The first level (name, animal, diet) should work fine.

    The reason seems to be that for nested transforms, Django changes the SQL syntax used, switching from a single value to a list to specify the path into the json structure.

    This is the syntax used for non-nested json transforms (= first level):

    "appname_pet"."data" -> 'diet'
    

    And this is the syntax used for nested transforms (deeper than first level):

    "appname_pet"."data" #> ARRAY['diet', 'dinner']
    

    While constructing the query, Django chokes on that list while working out the required GROUP BY clauses. This does not seem to be an inevitable restriction; the support for transforms is quite new, and this is possibly one of the kinks that haven't been worked out yet. So if you open a Django ticket, this might just work a few versions down the line.