druidpydruid

Apache Druid: pagination and SQL Rank() equivalent


As of Druid 0.17, there's no query pagination support (it was formerly available through select queries).

I'm trying to emulate in some way pagination and I thought that a possible approach may be creating a virtual dimension that numbers the resulting rows of a query, so that I can filter on that dimension.

This is something that can be easily done in SQL using the RANK function. I was wondering if there is anything similar in Druid.

Anyway, is there any consalidated pattern for obtaining paginated queries?


Solution

  • Apache Druid 0.17.0-1 and 0.18.0 didn't have any equivalent for the SQL RANK() function.

    My idea was to perform ordering on a query, numbering the rows and get only the first one.

    In my case the query is a group by and ordering has to be performed on Druid's __time field, so there is a solution. Indeed, depending on the needs, you can use the LATEST(expr)/EARLIEST(expr) aggregation functions. They work both with integers and strings (with a slightly different signature).

    Reference: http://druid.apache.org/docs/latest/querying/sql.html#aggregation-functions

    For pagination, it is currently unavailable