databasetime-seriesgrafanaquestdb

Query multiple tables with same structure dynamically


I have a problem which I am struggling to find a solution for in SQL.

In QuestDB I have 100+ tables which the same name structure and same schema: “value, time”.

Is there a query that I can do to combine them such that i have the data for each table in a single column now it will be “time, table1, table2, table3…”.

P.S. For context: I need this so that i can visualize them easier together in Grafana. P.S.2 I get all table names like so:

SELECT table_name FROM tables() 
WHERE table_name ~'example_table_';

Solution

  • I am going to answer this by parts. First about the type of query you would need to show the data in the format you want IF you already know all the table names beforehand (so a static query), and then I will add how you can get the table names dynamically, as you are already doing, and compose the query on the fly.

    For this I am creating first from the QuestDB web console four very simple tables and I am adding just a few rows to each:

    create table b1 (x long, ts timestamp) timestamp(ts) partition by day;
    
    insert into b1
    SELECT x, timestamp_sequence(
                to_timestamp('2019-10-17T00:00:00', 'yyyy-MM-ddTHH:mm:ss'),
                10000000L)
    FROM long_sequence(5);
    
    create table b2(like b1);
    create table b3(like b1);
    create table b4(like b1);
    insert into b2 select * from b1;
    insert into b3 select * from b1;
    insert into b4 select * from b1;
    

    Now, if I want to have the combination of those tables at the same timestamp, I could do a simple ASOF JOIN

    SELECT b1.ts, b1.x, b2.x, b3.x, b4.x from b1 asof join b2 asof join b3 asof join b4;
    

    This would give me every row from b1, and for each row it would join with the row at the exact timestamp from the other tables, or if not exact, at the closest past timestamp from each table. This works perfect if b1 has data for all the timestamps that you need. But imagine the 4 tables receive data at different intervals. Let's suppose for example that b1 gets one record every 5 seconds and b2 gets 1 record per second.

    In this case, since ASOF JOIN behaves like an outer join, it would show only one record for every 5 seconds, which is probably not what you need.

    Since you are using this for a dashboard, very likely it is safe to do an aggregation and so something like:

    SELECT b1.ts, avg(b1.x), avg(b2.x), avg(b3.x), avg(b4.x) from b1 asof join b2 asof join b3 asof join b4
    SAMPLE BY 1s FROM $__fromTime TO $__toTime  FILL(PREV);
    

    This query takes the average in 1 second intervals for all the tables, but more importantly, it will fill any gaps with the with the PREVious known value. We are also using the grafana macros to force the sampling to start and end at the intervals included on the dashboard, so you will get a row per second for each second in the interval, even if no data for some (or all) of the tables is available at that second. You might prefer to fill with NULL or LINEAR values, rather than PREV, but probably for this type of chart PREV is a safe bet.

    So far we have solved the problem of how to join multiple tables with the same structure, but we still need to approach how to do this dynamically. This needs to be done programmatically, as you need two steps, first you need to issue the query you already have to get the table names, then you need to compose the query using those names. But since you are using grafana, we can use dashboard variables.

    Your query has two dynamic parts: the column names and the table list in the FROM, so we will need to create two dashboard variables in grafana and use some SQL to compose those parts.

    I am creating first a variable named $b_tables with this SQL:

    with tbs as 
    (
      SELECT string_agg(table_name, ',') as names FROM tables() 
    WHERE table_name like 'b%'
    )
    select 
    replace(names, ',', ' ASOF JOIN ') from tbs;
    

    The output value of this query/variable will be: b1 ASOF JOIN b2 ASOF JOIN b3 ASOF JOIN b4

    Now I am defining a second variable named $column_avgs with this SQL:

    SELECT string_agg(concat('avg(', table_name, '.x)'),',') as names FROM tables() 
      WHERE table_name like 'b%'
    

    The value for this variable will be: avg(b1.x),avg(b2.x),avg(b3.x),avg(b4.x)

    And now I have this, I can just go to my chart on the dashboard and compose a query like:

    SELECT  b1.ts, $colum_avgs from 
    $b_tables 
    sample by 1s FROM $__fromTime   TO $__toTime FILL(LINEAR)   
    

    When I click on the chart to test the query, I can see this is sent to QuestDB:

    SELECT  b1.ts, avg(b1.x),avg(b2.x),avg(b3.x),avg(b4.x) from 
    b1 ASOF JOIN b2 ASOF JOIN b3 ASOF JOIN b4 
    sample by 1s FROM cast(1571176800000000 as timestamp)   TO cast(1571349600000000 as timestamp) FILL(PREV)   
    

    Which is what I was after.

    If you are not using grafana and need to do this programatically, the idea would be the same. You can first issue a query to get table names, then you can compose on the client side the query to issue to QuestDB.