databasepostgresqlcountinner-joinaggregate-filter

Postgres COUNT number of column values with INNER JOIN


I am creating a report in Postgres 9.3. This is my SQL Fiddle.
Basically I have two tables, responses and questions, the structure is:

responses
->id
->question_id
->response

questions
->id
->question
->costperlead

for the column response there can only be 3 values, Yes/No/Possbily, and my report should have the columns:

  question_id
, # of Yes Responses
, # of No Responses
, # of Possbily Responses
, Revenue

Then:

# of Yes Responses - count of all Yes values in the response column
# of No Responses - count of all No values in the response column
# of Possbily Responses - count of all 'Possbily' values in the response column

Revenue is the costperlead * (Number of Yes Responses + Number of Possibly Responses).

I don't know how to construct the query, I'm new plus I came from MySQL so some things are different for postgres. In my SQL Fiddle sample most responses are Yes and Null, it's ok eventually, there will be Possibly and No.

So far I have only:

SELECT a.question_id
FROM responses a
INNER JOIN questions b ON a.question_id = b.id
WHERE a.created_at = '2015-07-17'
GROUP BY a.question_id;

Solution

  • Since the only predicate filters rows from table responses, it would be most efficient to aggregate responses first, then join to questions:

    SELECT *, q.costperlead * (r.ct_yes + r.ct_maybe) AS revenue
    FROM  (
       SELECT question_id
            , count(*) FILTER (WHERE response = 'Yes')      AS ct_yes
            , count(*) FILTER (WHERE response = 'No')       AS ct_no
            , count(*) FILTER (WHERE response = 'Possibly') AS ct_maybe
       FROM   responses
       WHERE  created_at = '2015-07-17'
       GROUP  BY 1
       ) r
    JOIN   questions q ON q.id = r.question_id;
    

    db<>fiddle here

    This uses the aggregate FILTER clause (in Postgres 9.4 or later). See:

    Aside: consider implementing response as boolean type with true/false/null.

    For Postgres 9.3:

    SELECT *, q.costperlead * (r.ct_yes + r.ct_maybe) AS revenue
    FROM  (
       SELECT question_id
            , count(response = 'Yes' OR NULL)      AS ct_yes
            , count(response = 'No' OR NULL)       AS ct_no
            , count(response = 'Possibly' OR NULL) AS ct_maybe
       FROM   responses
       WHERE  created_at = '2015-07-17'
       GROUP  BY 1
       ) r
    JOIN   questions q ON q.id = r.question_id;
    

    Old sqlfiddle

    Comprehensive comparison of techniques: