databasetime-seriesquestdb

Slow ASOF JOIN query for data in same partition


I have this query, which performs quite fast.

WITH 
  events AS (SELECT * FROM events where time in '2025-02-28' and time>'2025-02-28 13:45' LIMIT 5),
  t AS (select * from trades where time in '2025-02-28')
SELECT e.time, e.sym, t.*
FROM events e
ASOF JOIN t ON sym

However, if I don't do the ASOF join from the temporary table restricted by date, and just do it directly from the whole table as follows:

WITH 
  events AS (SELECT * FROM events where time in '2025-02-28' and time>'2025-02-28 13:45' LIMIT 5)
SELECT e.time, e.sym, t.*
FROM events e
ASOF JOIN trades t ON sym;

It looks like it does a full scan of the entire "trades" table and takes many minutes to return because there are 2 years of trade data with like ~700 partitions on disk.

Is there a way to restrict ASOF joins to only look within the same partition as the left table? I really only want the last record prior to the event, and am fine having a null value if there's not one in the same partition, and the performance implications of doing a full scan of the trade table every query make it not feasible.

The issue here is that I'd like to do this join across multiple date partitions, so artificially restricting it by date won't really work, but I also would like to avoid the full table scans if possible.


Solution

  • If you run them with EXPLAIN, you will probably see AsOf Join on the slow one. It is, indeed, a slow implementation. QuestDB has an alternative AsOf Join Fast Scan. This does a more efficient scan (skips and binary search), but only works on a plain table (no filter).

    There is WIP to make AsOf Join as fast as the other cursor .For the query above, since you can accept not matching events outside your partition, you can just push down the where clause to both sides i.e:

    WITH 
      events AS (events where time in '2025-02-28' and time>'2025-02-28 13:45' LIMIT 5)
    SELECT e.time, e.sym, t.*
    FROM events e
    ASOF JOIN (trades where time in '2025-02-28' and time>'2025-02-28 13:45') ON sym
    

    If you need more partitions, you can make both WHERE clauses more complex adding an OR condition. Hopefully in a few releases this will not be needed.