I'm creating a web app that displays a pie chart. In order to get all the data for the chart from a PostgreSQL 9.3 database in a single HTTP request, I'm combining multiple SELECT
statements with UNION ALL
— here's a portion:
SELECT 'spf' as type, COUNT(*)
FROM (SELECT cai.id
FROM common_activityinstance cai
JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
JOIN common_activitysetting cas ON cas.id = cais.id
JOIN quizzes_quiz q ON q.id = cai.activity_id
WHERE cai.end_time::date = '2015-09-12'
AND q.name != 'Exit Ticket Quiz'
AND cai.activity_type = 'QZ'
AND (cas.key = 'disable_student_nav' AND cas.value = 'True'
OR cas.key = 'pacing' AND cas.value = 'student')
GROUP BY cai.id
HAVING COUNT(cai.id) = 2) sub
UNION ALL
SELECT 'spn' as type, COUNT(*)
FROM common_activityinstance cai
JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
JOIN common_activitysetting cas ON cas.id = cais.id
WHERE cai.end_time::date = '2015-09-12'
AND cai.activity_type = 'QZ'
AND cas.key = 'disable_student_nav'
AND cas.value = 'False'
UNION ALL
SELECT 'tp' as type, COUNT(*)
FROM (SELECT cai.id
FROM common_activityinstance cai
JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
JOIN common_activitysetting cas ON cas.id = cais.id
WHERE cai.end_time::date = '2015-09-12'
AND cai.activity_type = 'QZ'
AND cas.key = 'pacing' AND cas.value = 'teacher') sub;
This produces a nice, small response for sending back to the client:
type | count
------+---------
spf | 100153
spn | 96402
tp | 84211
I wonder if my queries can be made more efficient. Each SELECT statement uses mostly the same JOIN operations. Is there a way to not repeat the JOIN for each new SELECT?
And I would actually prefer a single row with 3 columns.
Or, in general, is there some entirely different but better approach than what I'm doing?
You can bundle most of the cost in a single main query in a CTE and reuse the result several times.
This returns a single row with three columns named after each type
(as requested in the comment):
WITH cte AS (
SELECT cai.id, cai.activity_id, cas.key, cas.value
FROM common_activityinstance cai
JOIN common_activityinstance_settings s ON s.activityinstance_id = cai.id
JOIN common_activitysetting cas ON cas.id = s.id
WHERE cai.end_time::date = '2015-09-12' -- problem?
AND cai.activity_type = 'QZ'
AND (cas.key = 'disable_student_nav' AND cas.value IN ('True', 'False') OR
cas.key = 'pacing' AND cas.value IN ('student', 'teacher'))
)
SELECT *
FROM (
SELECT count(*) AS spf
FROM (
SELECT c.id
FROM cte c
JOIN quizzes_quiz q ON q.id = c.activity_id
WHERE q.name <> 'Exit Ticket Quiz'
AND (c.key, c.value) IN (('disable_student_nav', 'True')
, ('pacing', 'student'))
GROUP BY 1
HAVING count(*) = 2
) sub
) spf
, (
SELECT count(key = 'disable_student_nav' AND value = 'False' OR NULL) AS spn
, count(key = 'pacing' AND value = 'teacher' OR NULL) AS tp
FROM cte
) spn_tp;
Should work for Postgres 9.3. In Postgres 9.4 use the new aggregate FILTER
clause:
count(*) FILTER (WHERE key = 'disable_student_nav' AND value = 'False') AS spn
, count(*) FILTER (WHERE key = 'pacing' AND value = 'teacher') AS tp
Details for both syntax variants:
The condition marked problem?
may be big performance problem, depending on the data type of cai.end_time
. For one, it's not sargable. And if it's a timestamptz
type, the expression is hard to index, because the result depends on the current time zone setting of the session - which can also lead to different results when executed in different time zones.
Compare:
You just have to name the time zone that is supposed to define your date. Taking my time zone in Vienna as example:
WHERE cai.end_time >= '2015-09-12 0:0'::timestamp AT TIME ZONE 'Europe/Vienna'
AND cai.end_time < '2015-09-13 0:0'::timestamp AT TIME ZONE 'Europe/Vienna'
You can provide simple timestamptz
values as well. You could even just:
WHERE cai.end_time >= '2015-09-12'::date
AND cai.end_time < '2015-09-12'::date + 1
But the first variant does not depend on the current time zone setting.
Detailed explanation in the links above.
Now the query can use your index and should be much faster if there are many different days in your table.