sql amazon-redshift unpivot union-all amazon-quicksight

Same result without using UNION ALL on repeating SELECT statements?

I built a table in Amazon QuickSight, but the way the viewer wants the data organized in a pivot table, I'm only able to achieve with the below SQL snippet. I don't have a problem leaving it as is, but I'd like to see if there are better ways of doing what I did, especially without using the repetitive parts in each individual table that is being combined together. There are actually a lot more "problem" columns I'm querying, so the actual SQL is very long and very repetitive.

The only part that is changed is problem and problem_count between each table.

'Service' AS problem,
 COUNT(CASE WHEN service_yn = 'Yes' THEN 1 END) AS problem_count

I'm sorry, I don't know how to properly format a pivot table but I'll do my best here. Imagine the following 3 tables are one large pivot table.

Base	Count
Accuracy of Bill	110
Service	200
Cleanliness	95
Value	75

Premium	Count
Accuracy of Bill	110
Service	200
Cleanliness	95
Value	75

Deluxe	Count
Accuracy of Bill	110
Service	200
Cleanliness	95
Value	75

SELECT CASE
         WHEN descriptor = 'Fast Food' THEN 'Base'
         WHEN descriptor IN ('Sit Down','Bar','Eatery') THEN 'Premium'
         WHEN descriptor IN ('Full Service','Boutique') THEN 'Deluxe'
         ELSE NULL
       END AS "brand_group",
       'Accuracy of bill' AS problem,
       COUNT(CASE WHEN accuracy_of_bill_yn = 'Yes' THEN 1 END) AS problem_count
FROM surveys
WHERE responsedate >= CURRENT_DATE-INTERVAL '12 months'
AND   surveyid NOT IN (SELECT surveyid FROM excluded)
AND   region IN (1,2,3,4,5)
AND   brand_group IS NOT NULL
GROUP BY brand_group

UNION ALL

SELECT CASE
         WHEN descriptor = 'Fast Food' THEN 'Base'
         WHEN descriptor IN ('Sit Down','Bar','Eatery') THEN 'Premium'
         WHEN descriptor IN ('Full Service','Boutique') THEN 'Deluxe'
         ELSE NULL
       END AS "brand_group",
       'Cleanliness' AS problem,
       COUNT(CASE WHEN cleanliness_yn = 'Yes' THEN 1 END) AS problem_count
FROM surveys
WHERE responsedate >= CURRENT_DATE-INTERVAL '12 months'
AND   surveyid NOT IN (SELECT surveyid FROM excluded)
AND   region IN (1,2,3,4,5)
AND   brand_group IS NOT NULL
GROUP BY brand_group

UNION ALL

SELECT CASE
         WHEN descriptor = 'Fast Food' THEN 'Base'
         WHEN descriptor IN ('Sit Down','Bar','Eatery') THEN 'Premium'
         WHEN descriptor IN ('Full Service','Boutique') THEN 'Deluxe'
         ELSE NULL
       END AS "brand_group",
       'Service' AS problem,
       COUNT(CASE WHEN service_yn = 'Yes' THEN 1 END) AS problem_count
FROM surveys
WHERE responsedate >= CURRENT_DATE-INTERVAL '12 months'
AND   surveyid NOT IN (SELECT surveyid FROM excluded)
AND   region IN (1,2,3,4,5)
AND   brand_group IS NOT NULL
GROUP BY brand_group

UNION ALL

SELECT CASE
         WHEN descriptor = 'Fast Food' THEN 'Base'
         WHEN descriptor IN ('Sit Down','Bar','Eatery') THEN 'Premium'
         WHEN descriptor IN ('Full Service','Boutique') THEN 'Deluxe'
         ELSE NULL
       END AS "brand_group",
       'Value' AS problem,
       COUNT(CASE WHEN value_yn = 'Yes' THEN 1 END) AS problem_count
FROM surveys
WHERE responsedate >= CURRENT_DATE-INTERVAL '12 months'
AND   surveyid NOT IN (SELECT surveyid FROM excluded)
AND   region IN (1,2,3,4,5)
AND   brand_group IS NOT NULL
GROUP BY brand_group

Solution

Note: this answer is for Postgres, as declared originally. Does not work for Redshift.

You need the reverse operation of a "pivot", a.k.a. "unpivot". Can be done elegantly with CROSS JOIN LATERAL to a VALUES expression after getting all counts in a single SELECT.

A single scan over the table should be substantially faster.
While being at it, I optimized a couple other things.

SELECT brand_group, p.*
FROM  (
   SELECT CASE descriptor
            WHEN 'Fast Food'    THEN 'Base'
            WHEN 'Sit Down'     THEN 'Premium'
            WHEN 'Bar'          THEN 'Premium'
            WHEN 'Eatery'       THEN 'Premium'
            WHEN 'Full Service' THEN 'Deluxe'
            WHEN 'Boutique'     THEN 'Deluxe'
          END AS brand_group
        , count(*) FILTER (WHERE accuracy_of_bill_yn = 'Yes') AS a_ct
        , count(*) FILTER (WHERE cleanliness_yn = 'Yes')      AS c_ct
        , count(*) FILTER (WHERE service_yn = 'Yes')          AS s_ct
        , count(*) FILTER (WHERE value_yn = 'Yes')            AS v_ct
   FROM   surveys s
   WHERE  NOT EXISTS (SELECT FROM excluded e WHERE e.surveyid = s.surveyid)
   AND    responsedate >= now() - interval '12 months'
   AND    region IN (1,2,3,4,5)
   AND    brand_group IS NOT NULL
   GROUP  BY 1
   ) sub
CROSS  JOIN LATERAL (
   VALUES
     ('Accuracy of bill', a_ct)
   , ('Cleanliness'     , c_ct)
   , ('Service'         , s_ct)
   , ('Value'           , v_ct)
   ) p(base, count)
ORDER  BY 1, 2;  -- need that?

fiddle

See:

Avoid NOT IN with subqueries. I replaced it with:

WHERE NOT EXISTS (SELECT FROM excluded e WHERE e.surveyid = s.surveyid)

See:

Select rows which are not present in other table

All of your *_yn columns should really be boolean. Not text or anything else.

About the aggregate FILTER clause:

Aggregate columns with additional (distinct) filters

I use a "switched" CASE. Should be slightly cheaper. See:

Display column name with max value between several columns

Just the base?

Typically, people just want the more compact result from the base query (maybe with more descriptive column names):

SELECT CASE descriptor
         WHEN 'Fast Food'    THEN 'Base'
         WHEN 'Sit Down'     THEN 'Premium'
         WHEN 'Bar'          THEN 'Premium'
         WHEN 'Eatery'       THEN 'Premium'
         WHEN 'Full Service' THEN 'Deluxe'
         WHEN 'Boutique'     THEN 'Deluxe'
       END AS brand_group
     , count(*) FILTER (WHERE accuracy_of_bill_yn = 'Yes') AS a_ct
     , count(*) FILTER (WHERE cleanliness_yn = 'Yes')      AS c_ct
     , count(*) FILTER (WHERE service_yn = 'Yes')          AS s_ct
     , count(*) FILTER (WHERE value_yn = 'Yes')            AS v_ct
FROM   surveys s
-- WHERE ... 
GROUP  BY 1;

fiddle