sqlpostgresqldynamic-sql

How to club multiple select queries from one single table


I would like to query a table which has few columns (such as 4 below) from where i would like to create a snapshot table which can provide insight on the attribute count everyday

userId attr1 attr2 attr3
1 true Hello Dog
2 false Hi
3 Hello
4 false Cat
5 true Hi Dog
6 false Cat
7 Hi
8 false Dog
9 Cat
10 false Hello Cat

Such as from above

userIdCount attrName dateWhenSnapshotInserted
7 attr1 April 16, 2024
6 attr2 April 16, 2024
7 attr3 April 16, 2024

In this case, we may have millions of rows in original table with more than 100 columns and we would like to populate snapshot table everyday with count of user with non null values for each attribute and date when this snapshot created.

I am framing my query as below :-

select COUNT(DISTINCT userId) as userIdCount from orig_table where
    attr1 IS NOT NULL UNION Select COUNT(DISTINCT userId) as userIdCount
    from orig_table where attr2 IS NOT NULL UNION Select COUNT(DISTINCT
    userId) as userIdCount from orig_table where attr3 IS NOT NULL;

Is there a optimized way to write multiple select query together. I started with creating multiple select query and inserting it to output table one by one but that seems like a bad pattern. Please suggest how to achieve this so as this can be scaled.

ps: Newbie (who is exploring DB queries first time)


Solution

  • I think you mean a materialized view. A regular view would recalculate the query on each select from it, whereas a materialized view keeps a snapshot of the result until you refresh it.

    The multiple selects part of the problem can be handled with dynamic SQL in a PL/pgSQL block.

    Demo at db<>fiddle:

    do $f$
    declare column_name_ text;
            dynamic_sql_query text;
            dynamic_sql_query_per_table text:=$$SELECT COUNT(DISTINCT "userId") AS %1$I
                                                     ,'%2$I' as "attrName"
                                                     ,now() AS "dateWhenSnapshotInserted"
                                               FROM orig_table 
                                               WHERE %2$I IS NOT NULL$$;
    begin
    for column_name_ in select column_name 
                        from information_schema.columns
                        where table_name='orig_table'
    loop
        dynamic_sql_query:=concat_ws(' UNION ALL '
                                     ,dynamic_sql_query
                                     ,format( dynamic_sql_query_per_table
                                             ,column_name_||'Count'
                                             ,column_name_));
    end loop;
    execute format('CREATE MATERIALIZED VIEW orig_table_counts AS %s',dynamic_sql_query);
    end $f$;
    
    select * from orig_table_counts;
    
    userIdCount attrName dateWhenSnapshotInserted
    10 "userId" 2024-04-16 09:07:44.604273+00
    7 attr1 2024-04-16 09:07:44.604273+00
    6 attr2 2024-04-16 09:07:44.604273+00
    7 attr3 2024-04-16 09:07:44.604273+00
    1. concat_ws() takes care of putting a union all between each query that collects counts for each column. It skips null arguments, so having dynamic_sql_query initially declared but unpopulated prevents a leading/trailing union separator.
    2. $$ are quotes that let you use other types of quotes inside them. %1$I and %2$I are format() placeholders, instructing it to insert the 1st/2nd argument as an optionally double-quoted I dentifier. %s tells it to put another one, as a plain string, without any special handling.
    3. Make sure you pay attention to your identifier case sensitivity: unless you use " double-quotes, they all get folded to lowercase, meaning that userId might in reality be called userid.