[SOLVED] Is it possible to SAMPLE BY any timestamp column in QuestDB?

Is it possible to SAMPLE BY any timestamp column in QuestDB?

All SAMPLE BY examples at https://questdb.io/docs/reference/sql/sample-by/ use the designated timestamp column of a table.

Is it possible to use a different column ?

Solution

Yes, it's possible by electing a dynamic timestamp.

Assuming the following schema and a few day's worth of data:

create table test 
( 
  ts timestamp, 
  changed timestamp, 
  code int, 
  value int
) 
timestamp(ts) partition by year;

insert into test 
select dateadd('s', x::int, '2023-01-01T00:00:00.000000Z'), 
       dateadd('s', x::int, '2023-02-01T00:00:00.000000Z'), x%100, x 
from long_sequence(400000);

the usual SAMPLE BY :

select ts, count(*), min(value), max(value)  
from  test 
sample by 1d

returns

ts                          count   min     max
2023-01-01T00:00:01.000000Z 86400   1       86400
2023-01-02T00:00:01.000000Z 86400   86401   172800
2023-01-03T00:00:01.000000Z 86400   172801  259200
2023-01-04T00:00:01.000000Z 86400   259201  345600
2023-01-05T00:00:01.000000Z 54400   345601  400000

while SAMPLE BY with changed column:

select changed, count(*), min(value), max(value) 
from 
(
  select * from test timestamp(changed)
) 
sample by 1d

yields:

changed                     count   min     max
2023-02-01T00:00:01.000000Z 86400   1       86400
2023-02-02T00:00:01.000000Z 86400   86401   172800
2023-02-03T00:00:01.000000Z 86400   172801  259200
2023-02-04T00:00:01.000000Z 86400   259201  345600
2023-02-05T00:00:01.000000Z 54400   345601  400000

Note the 1 month difference between timestamps in ts and changed columns.

If values of the column we want to SAMPLE BY are not arranged in increasing order, result may be weird .

Let's recreate data in test table, but this time insert changed values in descending order :

truncate table test;

insert into test 
select dateadd('s', x::int, '2023-01-01T00:00:00.000000Z'), 
   dateadd('s', -x::int, '2023-03-01T00:00:00.000000Z'), x%100, x 
from long_sequence(400000);

Now, running earlier SAMPLE BY query returns:

changed                                 count   min max
2023-02-28T23:59:59.000000Z 400000  1   400000

Not something we'd expect compared to similar query that uses GROUP BY:

select timestamp_floor('d', changed) as day, count(*), min(value), max(value)  
from  test 
group by 1
order by 1

Query returns:

day                         count   min     max
2023-02-24T00:00:00.000000Z 54400   345601  400000
2023-02-25T00:00:00.000000Z 86400   259201  345600
2023-02-26T00:00:00.000000Z 86400   172801  259200
2023-02-27T00:00:00.000000Z 86400   86401   172800
2023-02-28T00:00:00.000000Z 86400   1       86400