questdb

Is it possible to SAMPLE BY any timestamp column in QuestDB?


All SAMPLE BY examples at https://questdb.io/docs/reference/sql/sample-by/ use the designated timestamp column of a table.

Is it possible to use a different column ?


Solution

  • Yes, it's possible by electing a dynamic timestamp.

    Assuming the following schema and a few day's worth of data:

    create table test 
    ( 
      ts timestamp, 
      changed timestamp, 
      code int, 
      value int
    ) 
    timestamp(ts) partition by year;
    
    insert into test 
    select dateadd('s', x::int, '2023-01-01T00:00:00.000000Z'), 
           dateadd('s', x::int, '2023-02-01T00:00:00.000000Z'), x%100, x 
    from long_sequence(400000);
    

    the usual SAMPLE BY :

    select ts, count(*), min(value), max(value)  
    from  test 
    sample by 1d
    

    returns

    ts                          count   min     max
    2023-01-01T00:00:01.000000Z 86400   1       86400
    2023-01-02T00:00:01.000000Z 86400   86401   172800
    2023-01-03T00:00:01.000000Z 86400   172801  259200
    2023-01-04T00:00:01.000000Z 86400   259201  345600
    2023-01-05T00:00:01.000000Z 54400   345601  400000
    

    while SAMPLE BY with changed column:

    select changed, count(*), min(value), max(value) 
    from 
    (
      select * from test timestamp(changed)
    ) 
    sample by 1d
    

    yields:

    changed                     count   min     max
    2023-02-01T00:00:01.000000Z 86400   1       86400
    2023-02-02T00:00:01.000000Z 86400   86401   172800
    2023-02-03T00:00:01.000000Z 86400   172801  259200
    2023-02-04T00:00:01.000000Z 86400   259201  345600
    2023-02-05T00:00:01.000000Z 54400   345601  400000
    

    Note the 1 month difference between timestamps in ts and changed columns.

    If values of the column we want to SAMPLE BY are not arranged in increasing order, result may be weird .

    Let's recreate data in test table, but this time insert changed values in descending order :

    truncate table test;
    
    insert into test 
    select dateadd('s', x::int, '2023-01-01T00:00:00.000000Z'), 
       dateadd('s', -x::int, '2023-03-01T00:00:00.000000Z'), x%100, x 
    from long_sequence(400000);
    

    Now, running earlier SAMPLE BY query returns:

    changed                                 count   min max
    2023-02-28T23:59:59.000000Z 400000  1   400000
    

    Not something we'd expect compared to similar query that uses GROUP BY:

    select timestamp_floor('d', changed) as day, count(*), min(value), max(value)  
    from  test 
    group by 1
    order by 1
    

    Query returns:

    day                         count   min     max
    2023-02-24T00:00:00.000000Z 54400   345601  400000
    2023-02-25T00:00:00.000000Z 86400   259201  345600
    2023-02-26T00:00:00.000000Z 86400   172801  259200
    2023-02-27T00:00:00.000000Z 86400   86401   172800
    2023-02-28T00:00:00.000000Z 86400   1       86400