postgresql-9.5

PostgreSQL GROUP BY that includes zeros


I have a SQL query (postgresql) that looks something like this:

SELECT
    my_timestamp::timestamp::date as the_date,
    count(*) as count
FROM my_table
WHERE ...
GROUP BY the_date
ORDER BY the_date

The result is a table of YYYY-MM-DD, count pairs.

Now I've been asked to fill in the empty dates with zero. So if I was previously providing

2022-03-15    3
2022-03-17    1

I'd now want to return

2022-03-15    3
2022-03-16    0
2022-03-17    1

Now I can easily do this client-side (relative to the database) and let my program compute and return the zero-augmented list to its clients based on the original list from postgres. But perhaps it would better if I could just tell postgresql to include zeros.

I suspect this isn't easy at all, because postgres has no obvious way of knowing what I'm up to. But in the interests of learning more about postgres and SQL, I thought I'd have try. The try isn't too promising thus far...

Any pointers before I conclude that I was right to leave this to my (postgres client) program?

Update

This is an interesting case where my simplification of the problem led to a correct answer that didn't work for me. For those who come after, I thought it worth documenting what followed, because it take some fun twists through constructing SQL queries.

@a_horse_with_no_name responded with a query that I've verified works if I simplify my own query to match. Unfortunately, my query had some extra baggage that I didn't think pertinent, and so had trimmed out when posting the original question.

Here's my real (original) query, with all names preserved (if shortened):

-- current query
SELECT
    LEAST(time1, time2, time3, time4)::timestamp::date as the_date,
    count(*) as count
FROM reading_group_reader rgr
INNER JOIN (  SELECT group_id, group_type  ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)) TT
ON TT.group_id = rgr.group_id
  AND TT.group_type = rgr.group_type
WHERE LEAST(time1, time2, time3, time4) >  current_date - 30
GROUP BY the_date
ORDER BY the_date;

If I translate that directly into the proposed solution, however, the inner join between reading_group_reader and the temporary table TT causes the left join to become inner (I think) and the date sequence drops its zeros again. Fwiw, the table TT is a table because sometimes it actually is a subselect.

So I transformed my query into this:

SELECT
    g.dt::date as the_date,
    count(*) as count
FROM generate_series(date '2022-03-06', date '2022-04-06', interval '1 day') as g(dt)
LEFT JOIN (
    SELECT
        LEAST(rgr.time1, rgr.time2, rgr.time3, rgr.time4)::timestamp::date as the_date
    FROM reading_group_reader rgr
    INNER JOIN (
        SELECT group_id, group_type  ::group_type_name
        FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)) TT
    ON TT.group_id = rgr.group_id
      AND TT.group_type = rgr.group_type
) rgrt
ON rgrt.the_date = g.dt::date
GROUP BY g.dt
ORDER BY the_date;

but this outputs 1's instead of 0's at the places that should be 0.

The reason for that, however, is because I've now selected every date, so, of course, there's one of each. I need to include an additional field (which will be NULL) and count that.

So this query finally does what I want:

SELECT
    g.dt::date as the_date,
    count(rgrt.device_id) as count
FROM generate_series(date '2022-03-06', date '2022-04-06', interval '1 day') as g(dt)
LEFT JOIN (
    SELECT
        LEAST(rgr.time1, rgr.time2, rgr.time3, rgr.time4)::timestamp::date as the_date,
        rgr.device_id
    FROM reading_group_reader rgr
    INNER JOIN (
        SELECT group_id, group_type  ::group_type_name
        FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)
    ) TT
    ON TT.group_id = rgr.group_id
      AND TT.group_type = rgr.group_type
) rgrt(the_date)
ON rgrt.the_date = g.dt::date
GROUP BY g.dt
ORDER BY g.dt;

And, of course, on re-reading the accepted answer, I eventually saw that he did count an unrelated field, which I'd simply missed on my first several readings.


Solution

  • You will need to join to a list of dates. This can e.g. be done using generate_series()

    SELECT g.dt::date as the_date,
           count(t.my_timestamp) as count
    FROM generate_series(date '2022-03-01', 
                         date '2022-03-31', 
                         interval '1 day') as g(dt)
      LEFT JOIN my_table as t 
            ON t.my_timestamp::date = g.dt::date
           AND ... -- the original WHERE clause goes here!
    GROUP BY the_date
    ORDER BY the_date;
    

    Note that the original WHERE conditions need to go into the join condition of the LEFT JOIN. You can't put them into a WHERE clause because that would turn the outer join back into an inner join (which means the missing dates wouldn't be returned).