I built a table in Amazon QuickSight, but the way the viewer wants the data organized in a pivot table, I'm only able to achieve with the below SQL snippet. I don't have a problem leaving it as is, but I'd like to see if there are better ways of doing what I did, especially without using the repetitive parts in each individual table that is being combined together. There are actually a lot more "problem" columns I'm querying, so the actual SQL is very long and very repetitive.
The only part that is changed is problem
and problem_count
between each table.
'Service' AS problem,
COUNT(CASE WHEN service_yn = 'Yes' THEN 1 END) AS problem_count
I'm sorry, I don't know how to properly format a pivot table but I'll do my best here. Imagine the following 3 tables are one large pivot table.
Base | Count |
---|---|
Accuracy of Bill | 110 |
Service | 200 |
Cleanliness | 95 |
Value | 75 |
Premium | Count |
---|---|
Accuracy of Bill | 110 |
Service | 200 |
Cleanliness | 95 |
Value | 75 |
Deluxe | Count |
---|---|
Accuracy of Bill | 110 |
Service | 200 |
Cleanliness | 95 |
Value | 75 |
SELECT CASE
WHEN descriptor = 'Fast Food' THEN 'Base'
WHEN descriptor IN ('Sit Down','Bar','Eatery') THEN 'Premium'
WHEN descriptor IN ('Full Service','Boutique') THEN 'Deluxe'
ELSE NULL
END AS "brand_group",
'Accuracy of bill' AS problem,
COUNT(CASE WHEN accuracy_of_bill_yn = 'Yes' THEN 1 END) AS problem_count
FROM surveys
WHERE responsedate >= CURRENT_DATE-INTERVAL '12 months'
AND surveyid NOT IN (SELECT surveyid FROM excluded)
AND region IN (1,2,3,4,5)
AND brand_group IS NOT NULL
GROUP BY brand_group
UNION ALL
SELECT CASE
WHEN descriptor = 'Fast Food' THEN 'Base'
WHEN descriptor IN ('Sit Down','Bar','Eatery') THEN 'Premium'
WHEN descriptor IN ('Full Service','Boutique') THEN 'Deluxe'
ELSE NULL
END AS "brand_group",
'Cleanliness' AS problem,
COUNT(CASE WHEN cleanliness_yn = 'Yes' THEN 1 END) AS problem_count
FROM surveys
WHERE responsedate >= CURRENT_DATE-INTERVAL '12 months'
AND surveyid NOT IN (SELECT surveyid FROM excluded)
AND region IN (1,2,3,4,5)
AND brand_group IS NOT NULL
GROUP BY brand_group
UNION ALL
SELECT CASE
WHEN descriptor = 'Fast Food' THEN 'Base'
WHEN descriptor IN ('Sit Down','Bar','Eatery') THEN 'Premium'
WHEN descriptor IN ('Full Service','Boutique') THEN 'Deluxe'
ELSE NULL
END AS "brand_group",
'Service' AS problem,
COUNT(CASE WHEN service_yn = 'Yes' THEN 1 END) AS problem_count
FROM surveys
WHERE responsedate >= CURRENT_DATE-INTERVAL '12 months'
AND surveyid NOT IN (SELECT surveyid FROM excluded)
AND region IN (1,2,3,4,5)
AND brand_group IS NOT NULL
GROUP BY brand_group
UNION ALL
SELECT CASE
WHEN descriptor = 'Fast Food' THEN 'Base'
WHEN descriptor IN ('Sit Down','Bar','Eatery') THEN 'Premium'
WHEN descriptor IN ('Full Service','Boutique') THEN 'Deluxe'
ELSE NULL
END AS "brand_group",
'Value' AS problem,
COUNT(CASE WHEN value_yn = 'Yes' THEN 1 END) AS problem_count
FROM surveys
WHERE responsedate >= CURRENT_DATE-INTERVAL '12 months'
AND surveyid NOT IN (SELECT surveyid FROM excluded)
AND region IN (1,2,3,4,5)
AND brand_group IS NOT NULL
GROUP BY brand_group
Note: this answer is for Postgres, as declared originally. Does not work for Redshift.
You need the reverse operation of a "pivot", a.k.a. "unpivot". Can be done elegantly with CROSS JOIN LATERAL
to a VALUES
expression after getting all counts in a single SELECT
.
A single scan over the table should be substantially faster.
While being at it, I optimized a couple other things.
SELECT brand_group, p.*
FROM (
SELECT CASE descriptor
WHEN 'Fast Food' THEN 'Base'
WHEN 'Sit Down' THEN 'Premium'
WHEN 'Bar' THEN 'Premium'
WHEN 'Eatery' THEN 'Premium'
WHEN 'Full Service' THEN 'Deluxe'
WHEN 'Boutique' THEN 'Deluxe'
END AS brand_group
, count(*) FILTER (WHERE accuracy_of_bill_yn = 'Yes') AS a_ct
, count(*) FILTER (WHERE cleanliness_yn = 'Yes') AS c_ct
, count(*) FILTER (WHERE service_yn = 'Yes') AS s_ct
, count(*) FILTER (WHERE value_yn = 'Yes') AS v_ct
FROM surveys s
WHERE NOT EXISTS (SELECT FROM excluded e WHERE e.surveyid = s.surveyid)
AND responsedate >= now() - interval '12 months'
AND region IN (1,2,3,4,5)
AND brand_group IS NOT NULL
GROUP BY 1
) sub
CROSS JOIN LATERAL (
VALUES
('Accuracy of bill', a_ct)
, ('Cleanliness' , c_ct)
, ('Service' , s_ct)
, ('Value' , v_ct)
) p(base, count)
ORDER BY 1, 2; -- need that?
See:
Avoid NOT IN
with subqueries. I replaced it with:
WHERE NOT EXISTS (SELECT FROM excluded e WHERE e.surveyid = s.surveyid)
See:
All of your *_yn
columns should really be boolean
. Not text
or anything else.
About the aggregate FILTER
clause:
I use a "switched" CASE
. Should be slightly cheaper. See:
Typically, people just want the more compact result from the base query (maybe with more descriptive column names):
SELECT CASE descriptor
WHEN 'Fast Food' THEN 'Base'
WHEN 'Sit Down' THEN 'Premium'
WHEN 'Bar' THEN 'Premium'
WHEN 'Eatery' THEN 'Premium'
WHEN 'Full Service' THEN 'Deluxe'
WHEN 'Boutique' THEN 'Deluxe'
END AS brand_group
, count(*) FILTER (WHERE accuracy_of_bill_yn = 'Yes') AS a_ct
, count(*) FILTER (WHERE cleanliness_yn = 'Yes') AS c_ct
, count(*) FILTER (WHERE service_yn = 'Yes') AS s_ct
, count(*) FILTER (WHERE value_yn = 'Yes') AS v_ct
FROM surveys s
-- WHERE ...
GROUP BY 1;