vertica

How to pivot resultset in vertica


I have a table ProductCategorySales in Vertica that has below data

Product, Month, Year, Category1Sales, Category2Sales
P001,    2,     2019, 150,            231
P001,    4,     2022, 120,            52
P002,    1,     2023, 97,             328
P003,    7,     2018, 46,             157

I need to query this table to form a resultset as below.

Product, 2_2019_Cateogory1Sales, 2_2019_Category2Sales, 4_2022_Category1Sales, 4_2022_Category2Sales, 1_2023_Category1Sales, 1_2023_Category2Sales, 7_2018_Category1Sales, 7_2018_Category2Sales
P001, 150, 231, 120,52,0,0,0,0
P002, 0,0,0,0,97,328,0,0
P003, 0,0,0,0,0,0,46,157

So basically the month and year are pivoted to concatenate with Category1Sales and Category2Sales using underscores to form columns and group by is applied on Product which will result in single row for each product.

One thing to note here is that there could be any month year for given products, so i cant write switch case statement, my pivot query needs to be dynamic.

Can someone help me with this?


Solution

  • As @minatverma has pointed out: careful what you wish for - you might end up with a report with 97 columns for 4 years!

    Having said that: You want one "label", that is, column name in the report, per combination of category, month and year.

    So I verticalize your input first; adding a label column lbl that can contain, for example, 'sls_1_2023_cat1', and UNION SELECT the rest of the input, returning the category 1 sales first, the category 2 sales afterwards in one sales column. That's the CREATE TABLE step.

    Then, I use an anonymous code block - the bit between DO $$ and the final $$ (use Vertica's front end vsql to run the script, as many front ends split the text between the two double dollars into single chunks sent to the database, and that does not work).

    In that block, I create a local temporary table (as blocks can't return result tables). I start building a SQL statement with what will be fix, then, for each distinct value, for lbl that I find, I generate an expression of, for example:

    NVL(MAX(CASE lbl WHEN '1_2023_cat1' THEN sales END),0) AS sls_1_2023_cat1 ,

    and concatenate that to the already built sql statement, and end with the hard-wired FROM and GROUP BY clauses.

    Then, I execute my built sql statement, and, after the code block, immediately SELECT * FROM pivot_out . In the same session, as a temp table dies once logged out.

    Create the verticalized table:

    DROP TABLE IF EXISTS vert;
    CREATE TABLE vert AS
    WITH
    indata(Product,Month,Year,Category1Sales,Category2Sales) AS (
              SELECT 'P001',2,2019,150,231
    UNION ALL SELECT 'P001',4,2022,120,52
    UNION ALL SELECT 'P002',1,2023,97,328
    UNION ALL SELECT 'P003',7,2018,46,157
    )
    SELECT
      product
    , "Month"::VARCHAR(2)||'_'||"Year"::VARCHAR(4)||'_cat1' AS lbl
    , Category1Sales AS sales
    FROM indata
    UNION ALL
    SELECT
      product
    , "Month"::VARCHAR(2)||'_'||"Year"::VARCHAR(4)||'_cat2' AS lbl
    , Category2Sales AS sales
    FROM indata
    ORDER BY 1,2;
    

    Pivoting:

     DO $$
     DECLARE
       sq     VARCHAR;
       c      VARCHAR;
     BEGIN
       sq:= CHR(10)||'CREATE LOCAL TEMPORARY TABLE pivot_out ON COMMIT PRESERVE ROWS A
       FOR c IN EXECUTE
       'SELECT DISTINCT lbl FROM vert ORDER BY 1' LOOP
         sq := sq||CHR(10)||', NVL(MAX(CASE lbl WHEN '||QUOTE_LITERAL(c)||' THEN sales
       END LOOP;
       sq := sq ||CHR(10)||'FROM vert GROUP BY product';
     
       -- vv test output ...
       RAISE NOTICE '
         *** Running this Query *** %',sq;
       -- ^^ test output ...
       EXECUTE sq;
     END;
     $$;
     SELECT * FROM pivot_out;
    

    Result:

    product sales_1_2023_cat1 sales_1_2023_cat2 sales_2_2019_cat1 sales_2_2019_cat2 sales_4_2022_cat1 sales_4_2022_cat2 sales_7_2018_cat1 sales_7_2018_cat2
    P001 0 0 150 231 120 52 0 0
    P002 97 328 0 0 0 0 0 0
    P003 0 0 0 0 0 0 46 157