databasetime-seriesquestdb

Making SAMPLE BY/UNION query more robust


Once in a while I have the use case of doing a SAMPLE BY with a FILL to interpolate empty rows for each interval, but with fixed boundaries, for example for 7 days from now. I know this has been asked in the past, but the result query I have is error prone.

In my query below I want to get hourly totals per merchant_id and for the past 7 days. I want to have a dense result with an hour per merchant_id even if I have no activity in that hour.

I need to use the merchant_id, start, and end dates in different parts of the queries, at the two UNIONS around the main query and also in the WHERE condition of my main query. It is not the first time I remember to change in one place but not in the other, and results are wrong.

I was wondering if there is some way to re-write this query so I need to enter the start and end timestamps just once to make it more robust.

WITH bounded AS ( (
    SELECT  date_trunc('day', dateadd('d', -31, now())) as timestamp, 1 as merchant_id, 0 as deposit, 0 as withdraw
    UNION ALL
    SELECT timestamp, merchant_id,
            CASE WHEN transaction_type = 'deposit' then 1 ELSE 0 END AS deposit,
            CASE WHEN transaction_type = 'withdraw' then 1 ELSE 0 END AS withdraw
        FROM transactions
        WHERE merchant_id = 1 AND status = 'success'
        AND timestamp between   date_trunc('day', dateadd('d', -31, now())) AND now()
    UNION ALL
    SELECT  now() as timestamp, 1 as merchant_id, 0 as deposit, 0 as withdraw
)  timestamp(timestamp) )
SELECT 
    timestamp,
    merchant_id,
    SUM(deposit) AS transaction_deposit_count,
    SUM(withdraw) AS transaction_withdraw_count
FROM bounded
SAMPLE BY 1d FILL(0,0);

Solution

  • I found a way to solve this :)

    In QuestDB you can declare query variables, and those variables can be not just literals but actually function calls. I declared @lower_bound and @upper_bound variables and I can just use them inside my query as many times as needed, but changing at a single point. This way it is impossible to get inconsistent values and broken results.

    DECLARE 
        @merchant_id := 1,
        @lower_bound := date_trunc('day', dateadd('d', -7, now())),
        @upper_bound := now()
    WITH bounded AS ( (
        SELECT @lower_bound as timestamp, @merchant_id as merchant_id, 0 as deposit, 0 as withdraw
        UNION ALL
        SELECT timestamp, merchant_id,
                CASE WHEN transaction_type = 'deposit' then 1 ELSE 0 END AS deposit,
                CASE WHEN transaction_type = 'withdraw' then 1 ELSE 0 END AS withdraw
            FROM transactions
            WHERE merchant_id = 1 AND status = 'success'
            AND timestamp between  @lower_bound AND @upper_bound
        UNION ALL
        SELECT @upper_bound as timestamp, @merchant_id as merchant_id, 0 as deposit, 0 as withdraw
    )  timestamp(timestamp) )
    SELECT 
        timestamp,
        merchant_id,
        SUM(deposit) AS transaction_deposit_count,
        SUM(withdraw) AS transaction_withdraw_count
    FROM bounded
    SAMPLE BY 1d FILL(0,0);