I have a table ProductCategorySales
in Vertica
that has below data
Product, Month, Year, Category1Sales, Category2Sales
P001, 2, 2019, 150, 231
P001, 4, 2022, 120, 52
P002, 1, 2023, 97, 328
P003, 7, 2018, 46, 157
I need to query this table to form a resultset as below.
Product, 2_2019_Cateogory1Sales, 2_2019_Category2Sales, 4_2022_Category1Sales, 4_2022_Category2Sales, 1_2023_Category1Sales, 1_2023_Category2Sales, 7_2018_Category1Sales, 7_2018_Category2Sales
P001, 150, 231, 120,52,0,0,0,0
P002, 0,0,0,0,97,328,0,0
P003, 0,0,0,0,0,0,46,157
So basically the month and year are pivoted to concatenate with Category1Sales and Category2Sales using underscores to form columns and group by is applied on Product which will result in single row for each product.
One thing to note here is that there could be any month year for given products, so i cant write switch case statement, my pivot query needs to be dynamic.
Can someone help me with this?
As @minatverma has pointed out: careful what you wish for - you might end up with a report with 97 columns for 4 years!
Having said that: You want one "label", that is, column name in the report, per combination of category, month and year.
So I verticalize your input first; adding a label column lbl
that can contain, for example, 'sls_1_2023_cat1', and UNION SELECT the rest of the input, returning the category 1 sales first, the category 2 sales afterwards in one sales
column. That's the CREATE TABLE
step.
Then, I use an anonymous code block - the bit between DO $$
and the final $$
(use Vertica's front end vsql
to run the script, as many front ends split the text between the two double dollars into single chunks sent to the database, and that does not work).
In that block, I create a local temporary table (as blocks can't return result tables). I start building a SQL statement with what will be fix, then, for each distinct value, for lbl
that I find, I generate an expression of, for example:
NVL(MAX(CASE lbl WHEN '1_2023_cat1' THEN sales END),0) AS sls_1_2023_cat1
,
and concatenate that to the already built sql statement, and end with the hard-wired FROM
and GROUP BY
clauses.
Then, I execute my built sql statement, and, after the code block, immediately SELECT * FROM pivot_out
. In the same session, as a temp table dies once logged out.
Create the verticalized table:
DROP TABLE IF EXISTS vert;
CREATE TABLE vert AS
WITH
indata(Product,Month,Year,Category1Sales,Category2Sales) AS (
SELECT 'P001',2,2019,150,231
UNION ALL SELECT 'P001',4,2022,120,52
UNION ALL SELECT 'P002',1,2023,97,328
UNION ALL SELECT 'P003',7,2018,46,157
)
SELECT
product
, "Month"::VARCHAR(2)||'_'||"Year"::VARCHAR(4)||'_cat1' AS lbl
, Category1Sales AS sales
FROM indata
UNION ALL
SELECT
product
, "Month"::VARCHAR(2)||'_'||"Year"::VARCHAR(4)||'_cat2' AS lbl
, Category2Sales AS sales
FROM indata
ORDER BY 1,2;
Pivoting:
DO $$
DECLARE
sq VARCHAR;
c VARCHAR;
BEGIN
sq:= CHR(10)||'CREATE LOCAL TEMPORARY TABLE pivot_out ON COMMIT PRESERVE ROWS A
FOR c IN EXECUTE
'SELECT DISTINCT lbl FROM vert ORDER BY 1' LOOP
sq := sq||CHR(10)||', NVL(MAX(CASE lbl WHEN '||QUOTE_LITERAL(c)||' THEN sales
END LOOP;
sq := sq ||CHR(10)||'FROM vert GROUP BY product';
-- vv test output ...
RAISE NOTICE '
*** Running this Query *** %',sq;
-- ^^ test output ...
EXECUTE sq;
END;
$$;
SELECT * FROM pivot_out;
Result:
product | sales_1_2023_cat1 | sales_1_2023_cat2 | sales_2_2019_cat1 | sales_2_2019_cat2 | sales_4_2022_cat1 | sales_4_2022_cat2 | sales_7_2018_cat1 | sales_7_2018_cat2 |
---|---|---|---|---|---|---|---|---|
P001 | 0 | 0 | 150 | 231 | 120 | 52 | 0 | 0 |
P002 | 97 | 328 | 0 | 0 | 0 | 0 | 0 | 0 |
P003 | 0 | 0 | 0 | 0 | 0 | 0 | 46 | 157 |