
Django & Postgres - percentile (median) and group by

I need to calculate period medians per seller ID (see simplyfied model below). The problem is I am unable to construct the ORM query.


class MyModel:
    period = models.IntegerField(null=True, default=None)
    seller_ids = ArrayField(models.IntegerField(), default=list)
    aux = JSONField(default=dict)


queryset = (
    .annotate(seller_id=Func(F("seller_ids"), function="unnest"))
        duration=Cast(KeyTextTransform("duration", "aux"), IntegerField()),
            template="%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)",
    .values("median", "seller_id")

ArrayField aggregation (seller_id) source

I think what I need to do is something along the lines below

select t.*, p_25, p_75
from t join
     (select district,
             percentile_cont(0.25) within group (order by sales) as p_25,
             percentile_cont(0.75) within group (order by sales) as p_75
      from t
      group by district
     ) td
     on t.district = td.district

above example source

Python 3.7.5, Django 2.2.8, Postgres 11.1


  • You can create a Median child class of the Aggregate class as was done by Ryan Murphy ( Median then works just like Avg:

        from django.db.models import Aggregate, FloatField
        class Median(Aggregate):
            function = 'PERCENTILE_CONT'
            name = 'median'
            output_field = FloatField()
            template = '%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)'

    Then to find the median of a field use

        my_model_aggregate = MyModel.objects.all().aggregate(Median('period'))

    which is then available as my_model_aggregate['period__median'].