Somewhat of an unusual request. We have a software that asks survey questions. Any survey can include any number of questions (From 1 to 30 easily, but there is no limit).
The request from the user is to pull the data out with a line for each survey, response_ID and the questions pivoted. I have accomplished this so the data is in the format of:
Survey ID | Response_ID | Question1 | Question2... | QuestionK | Q1 | Q2 |
---|---|---|---|---|---|---|
ABC | 1 | 1 | Null | Null | NULL | NULL |
ABC | 2 | Null | 4 | NULL | NULL | NULL |
DEF | 1 | Null | Null | NULL | 10 | NULL |
GHI | 1 | 10 | NULL | NULL | ||
GHI | 2 | NULL | 5 | NULL |
Where Survey ABC has 2 questions scored from 1-10 Survey DEF has 1 question but the question was non-standard so it show up in column Q1 Survey GHI has 2 questions scored from 1-10
If we select survey DEF in our BI tool, we don't want to show columns QUestion1-QuestionK, we just want the columns Q1 and Q2
The way I've thought to do this is to Group by the Survey_ID and Response_ID, but we don't know how many columns there are and so we can't aggregate the rest of the columns
To filter by survey ID and then make a temporary table on just those results and exclude Survey_ID, Response_ID and coalesce(*) as all cols and then only show olumns with data, but this seems to return all columns as I think I need to aggregate first and then have only the columns with data
The desired Output:in the BI tool
Filter on SUrvey ID: DEF
Survey | Responses | Q1 |
---|---|---|
DEF | 1 | 10 |
Filter on Survey ID: ABC
Survey ID | Responses | Question1 | Question 2 |
---|---|---|---|
ABC | 2 | 1 | 4 |
Filter: GHI
Survey ID | Response | Question1 | Question 2 |
---|---|---|---|
GHI | 2 | 10 | 5 |
Any help on the aggregation, removing columns that are null, etc. would be appreciated.
select survey_ID, Response_ID, max(\*)
from surveypivot
Where survey_ID = 'DEF'
group by Survey_ID,Response_ID
Disclaimer: It is a strange request to generate column list dynamically based on filter and personally I would not recommend such code to be used.
With that being said, Snowflake SQL is expressive enough to generate such output:
WITH cte AS (
SELECT *, OBJECT_CONSTRUCT(* EXCLUDE(SURVEY_ID, RESPONSE_ID)) AS obj
FROM tab
WHERE Survey_ID = 'GHI'
), cte_unpivot AS (
SELECT SURVEY_ID, COUNT(*) OVER(PARTITION BY SURVEY_ID) AS RESPONSES,
KEY, MAX(VALUE)::NUMBER AS VALUE
FROM cte
,TABLE(FLATTEN(obj))
GROUP BY SURVEY_ID, KEY
)
SELECT *
FROM cte_unpivot
PIVOT (MAX(VALUE) FOR KEY IN (ANY));
For sample data:
CREATE OR REPLACE TABLE tab(Survey_ID TEXT, Response_ID INT, Question1 INT,
Question2 INT, QuestionK INT,Q1 INT, Q2 INT)
AS SELECT 'ABC',1 , 1,Null,Null, NULL,NULL
UNION SELECT 'ABC', 2, Null,4,NULL,NULL,NULL
UNION SELECT 'DEF', 1, Null ,Null, NULL,10 ,NULL
UNION SELECT 'GHI' ,1, 10, NULL, NULL, NULL, NULL
UNION SELECT 'GHI', 2, NULL, 5, NULL, NULL, NULL;
Output:
Related: