I have a problem which I am struggling to find a solution for in SQL.
In QuestDB I have 100+ tables which the same name structure and same schema: “value, time”.
Is there a query that I can do to combine them such that i have the data for each table in a single column now it will be “time, table1, table2, table3…”.
P.S. For context: I need this so that i can visualize them easier together in Grafana. P.S.2 I get all table names like so:
SELECT table_name FROM tables()
WHERE table_name ~'example_table_';
I am going to answer this by parts. First about the type of query you would need to show the data in the format you want IF you already know all the table names beforehand (so a static query), and then I will add how you can get the table names dynamically, as you are already doing, and compose the query on the fly.
For this I am creating first from the QuestDB web console four very simple tables and I am adding just a few rows to each:
create table b1 (x long, ts timestamp) timestamp(ts) partition by day;
insert into b1
SELECT x, timestamp_sequence(
to_timestamp('2019-10-17T00:00:00', 'yyyy-MM-ddTHH:mm:ss'),
10000000L)
FROM long_sequence(5);
create table b2(like b1);
create table b3(like b1);
create table b4(like b1);
insert into b2 select * from b1;
insert into b3 select * from b1;
insert into b4 select * from b1;
Now, if I want to have the combination of those tables at the same timestamp, I could do a simple ASOF JOIN
SELECT b1.ts, b1.x, b2.x, b3.x, b4.x from b1 asof join b2 asof join b3 asof join b4;
This would give me every row from b1, and for each row it would join with the row at the exact timestamp from the other tables, or if not exact, at the closest past timestamp from each table. This works perfect if b1 has data for all the timestamps that you need. But imagine the 4 tables receive data at different intervals. Let's suppose for example that b1
gets one record every 5 seconds and b2
gets 1 record per second.
In this case, since ASOF JOIN
behaves like an outer join, it would show only one record for every 5 seconds, which is probably not what you need.
Since you are using this for a dashboard, very likely it is safe to do an aggregation and so something like:
SELECT b1.ts, avg(b1.x), avg(b2.x), avg(b3.x), avg(b4.x) from b1 asof join b2 asof join b3 asof join b4
SAMPLE BY 1s FROM $__fromTime TO $__toTime FILL(PREV);
This query takes the average in 1 second intervals for all the tables, but more importantly, it will fill any gaps with the with the PREVious known value. We are also using the grafana macros to force the sampling to start and end at the intervals included on the dashboard, so you will get a row per second for each second in the interval, even if no data for some (or all) of the tables is available at that second. You might prefer to fill with NULL
or LINEAR
values, rather than PREV
, but probably for this type of chart PREV
is a safe bet.
So far we have solved the problem of how to join multiple tables with the same structure, but we still need to approach how to do this dynamically. This needs to be done programmatically, as you need two steps, first you need to issue the query you already have to get the table names, then you need to compose the query using those names. But since you are using grafana, we can use dashboard variables.
Your query has two dynamic parts: the column names and the table list in the FROM
, so we will need to create two dashboard variables in grafana and use some SQL to compose those parts.
I am creating first a variable named $b_tables
with this SQL:
with tbs as
(
SELECT string_agg(table_name, ',') as names FROM tables()
WHERE table_name like 'b%'
)
select
replace(names, ',', ' ASOF JOIN ') from tbs;
The output value of this query/variable will be: b1 ASOF JOIN b2 ASOF JOIN b3 ASOF JOIN b4
Now I am defining a second variable named $column_avgs
with this SQL:
SELECT string_agg(concat('avg(', table_name, '.x)'),',') as names FROM tables()
WHERE table_name like 'b%'
The value for this variable will be: avg(b1.x),avg(b2.x),avg(b3.x),avg(b4.x)
And now I have this, I can just go to my chart on the dashboard and compose a query like:
SELECT b1.ts, $colum_avgs from
$b_tables
sample by 1s FROM $__fromTime TO $__toTime FILL(LINEAR)
When I click on the chart to test the query, I can see this is sent to QuestDB:
SELECT b1.ts, avg(b1.x),avg(b2.x),avg(b3.x),avg(b4.x) from
b1 ASOF JOIN b2 ASOF JOIN b3 ASOF JOIN b4
sample by 1s FROM cast(1571176800000000 as timestamp) TO cast(1571349600000000 as timestamp) FILL(PREV)
Which is what I was after.
If you are not using grafana and need to do this programatically, the idea would be the same. You can first issue a query to get table names, then you can compose on the client side the query to issue to QuestDB.