jsondjangoorm

How to aggregate (min/max etc.) over Django JSONField data?


I'm using Django 1.9 with its built-in JSONField and Postgres 9.4. In my model's attrs json field I store objects with some values, including numbers. And I need to aggregate over them to find min/max values. Something like this:

Model.objects.aggregate(min=Min('attrs__my_key'))

Also, it would be useful to extract specific keys:

Model.objects.values_list('attrs__my_key', flat=True)

The above queries fail with

FieldError: "Cannot resolve keyword 'my_key' into field. Join on 'attrs' not permitted."

Is it possible somehow?

Notes:

  1. I know how to make a plain Postgres query to do the job, but am searching specifically for an ORM solution to have the ability to filter etc.
  2. I suppose this can be done with a (relatively) new query expressions/lookups API, but I haven't studied it yet.

Solution

  • For those who interested, I've found the solution (or workaround at least).

    from django.db.models.expressions import RawSQL
    
    Model.objects.annotate(
        val=RawSQL("((attrs->>%s)::numeric)", (json_field_key,))
    ).aggregate(min=Min('val'))
    

    Note that attrs->>%s expression will become something like attrs->>'width' after processing (I mean single quotes). So if you hardcode this name you should remember to insert them or you will get error.

    A little bit off-topic

    And one more tricky issue not related to Django itself but that is needed to be handled somehow. As attrs is a JSON field and there are no restrictions on its keys and values, you can (depending on you application logic) get some non-numeric values, for example in a width key. In this case you will get DataError from Postgres as a result of executing the above query. NULL values will be ignored meanwhile so it's OK. If you can just catch the error then no problem, you're lucky. In my case I needed to ignore wrong values and the only way here is to write a custom Postgres function that will suppress casting errors.

    create or replace function safe_cast_to_numeric(text) returns numeric as $$
    begin
        return cast($1 as numeric);
    exception
        when invalid_text_representation then
            return null;
    end;
    $$ language plpgsql immutable;
    

    And then use it to cast text to numbers:

    Model.objects.annotate(
        val=RawSQL("safe_cast_to_numeric(attrs->>%s)", (json_field_key,))
    ).aggregate(min=Min('val'))
    

    Thus we get quite a solid solution for such a dynamic thing as JSON.