All SAMPLE BY examples at https://questdb.io/docs/reference/sql/sample-by/ use the designated timestamp column of a table.
Is it possible to use a different column ?
Yes, it's possible by electing a dynamic timestamp.
Assuming the following schema and a few day's worth of data:
create table test
(
ts timestamp,
changed timestamp,
code int,
value int
)
timestamp(ts) partition by year;
insert into test
select dateadd('s', x::int, '2023-01-01T00:00:00.000000Z'),
dateadd('s', x::int, '2023-02-01T00:00:00.000000Z'), x%100, x
from long_sequence(400000);
the usual SAMPLE BY :
select ts, count(*), min(value), max(value)
from test
sample by 1d
returns
ts count min max
2023-01-01T00:00:01.000000Z 86400 1 86400
2023-01-02T00:00:01.000000Z 86400 86401 172800
2023-01-03T00:00:01.000000Z 86400 172801 259200
2023-01-04T00:00:01.000000Z 86400 259201 345600
2023-01-05T00:00:01.000000Z 54400 345601 400000
while SAMPLE BY with changed
column:
select changed, count(*), min(value), max(value)
from
(
select * from test timestamp(changed)
)
sample by 1d
yields:
changed count min max
2023-02-01T00:00:01.000000Z 86400 1 86400
2023-02-02T00:00:01.000000Z 86400 86401 172800
2023-02-03T00:00:01.000000Z 86400 172801 259200
2023-02-04T00:00:01.000000Z 86400 259201 345600
2023-02-05T00:00:01.000000Z 54400 345601 400000
Note the 1 month difference between timestamps
in ts and changed
columns.
If values of the column we want to SAMPLE BY are not arranged in increasing order, result may be weird .
Let's recreate data in test table, but this time insert changed
values in descending order :
truncate table test;
insert into test
select dateadd('s', x::int, '2023-01-01T00:00:00.000000Z'),
dateadd('s', -x::int, '2023-03-01T00:00:00.000000Z'), x%100, x
from long_sequence(400000);
Now, running earlier SAMPLE BY query returns:
changed count min max
2023-02-28T23:59:59.000000Z 400000 1 400000
Not something we'd expect compared to similar query that uses GROUP BY:
select timestamp_floor('d', changed) as day, count(*), min(value), max(value)
from test
group by 1
order by 1
Query returns:
day count min max
2023-02-24T00:00:00.000000Z 54400 345601 400000
2023-02-25T00:00:00.000000Z 86400 259201 345600
2023-02-26T00:00:00.000000Z 86400 172801 259200
2023-02-27T00:00:00.000000Z 86400 86401 172800
2023-02-28T00:00:00.000000Z 86400 1 86400