snowflake-cloud-data-platform pivot aggregate

Filtering a pivot to only include columns with data [Snowflake]

Somewhat of an unusual request. We have a software that asks survey questions. Any survey can include any number of questions (From 1 to 30 easily, but there is no limit).

The request from the user is to pull the data out with a line for each survey, response_ID and the questions pivoted. I have accomplished this so the data is in the format of:

Survey ID	Response_ID	Question1	Question2...	QuestionK	Q1	Q2
ABC	1	1	Null	Null	NULL	NULL
ABC	2	Null	4	NULL	NULL	NULL
DEF	1	Null	Null	NULL	10	NULL
GHI	1	10	NULL	NULL
GHI	2	NULL	5	NULL

Where Survey ABC has 2 questions scored from 1-10 Survey DEF has 1 question but the question was non-standard so it show up in column Q1 Survey GHI has 2 questions scored from 1-10

If we select survey DEF in our BI tool, we don't want to show columns QUestion1-QuestionK, we just want the columns Q1 and Q2

The way I've thought to do this is to Group by the Survey_ID and Response_ID, but we don't know how many columns there are and so we can't aggregate the rest of the columns

To filter by survey ID and then make a temporary table on just those results and exclude Survey_ID, Response_ID and coalesce(*) as all cols and then only show olumns with data, but this seems to return all columns as I think I need to aggregate first and then have only the columns with data

The desired Output:in the BI tool Filter on SUrvey ID: DEF

Survey	Responses	Q1
DEF	1	10

Filter on Survey ID: ABC

Survey ID	Responses	Question1	Question 2
ABC	2	1	4

Filter: GHI

Survey ID	Response	Question1	Question 2
GHI	2	10	5

Any help on the aggregation, removing columns that are null, etc. would be appreciated.

select survey_ID, Response_ID, max(\*)
from surveypivot
Where survey_ID = 'DEF'
group by Survey_ID,Response_ID

Solution

Disclaimer: It is a strange request to generate column list dynamically based on filter and personally I would not recommend such code to be used.

With that being said, Snowflake SQL is expressive enough to generate such output:

WITH cte AS (
  SELECT *, OBJECT_CONSTRUCT(* EXCLUDE(SURVEY_ID, RESPONSE_ID)) AS obj
  FROM tab
  WHERE Survey_ID = 'GHI'
), cte_unpivot AS (
  SELECT SURVEY_ID, COUNT(*) OVER(PARTITION BY SURVEY_ID) AS RESPONSES, 
         KEY, MAX(VALUE)::NUMBER AS VALUE
  FROM cte
  ,TABLE(FLATTEN(obj))
  GROUP BY SURVEY_ID, KEY
)
SELECT *
FROM cte_unpivot
PIVOT (MAX(VALUE) FOR KEY IN (ANY));

For sample data:

CREATE OR REPLACE TABLE tab(Survey_ID TEXT, Response_ID INT, Question1 INT,
                            Question2 INT, QuestionK INT,Q1 INT, Q2 INT)
AS    SELECT 'ABC',1 , 1,Null,Null, NULL,NULL
UNION SELECT 'ABC', 2,  Null,4,NULL,NULL,NULL
UNION SELECT 'DEF', 1,  Null ,Null, NULL,10 ,NULL
UNION SELECT 'GHI' ,1,  10, NULL, NULL, NULL, NULL
UNION SELECT 'GHI', 2, NULL, 5, NULL, NULL, NULL;

Output: