djangopostgresqlhyperloglog

Django culmulative sum of HyperLogLog (HLL) Postgres field


I'm using the HyperLogLog (hll) field to represent unique users, using the Django django-pg-hll package. What I'd like to do is get a cumulative total of unique users over a specific time period, but I'm having trouble doing this.

Given a model like:

class DailyUsers(model.Model):
  date = models.DateField()
  users = HllField()

I can get the cummulative HllField for each day like so:

queryset = models.DailyUsers.objects.annotate(
  cumulative_hll_users=Window(
    UnionAgg("users"), order_by=F('date').asc()
  )
)

However, when I try and get the cardinality (the actual number) like so:

queryset = queryset.annotate(
  cumsum=Cardinality("cumulative_hll_users")
)

The following error occurs:

django.db.utils.ProgrammingError: OVER specified, but hll_cardinality is not a window function nor an aggregate function
LINE 1: SELECT "app_dailyusers"."date", hll_cardinality...

Which is odd because Cardinality is defined as an aggregate function. I'm not sure if there's a way around this, I imagine it might be possible to do this in raw sql, but I haven't made much progress.

A solution in either the Django ORM or raw SQL would be greatly appreciated.


Solution

  • This bug occurs because the django-pg-hll pacakage uses the hll_cardinalityfunction instead of the # operator for window functions. Moving to a raw sql solution fixed the issue.