sqlpostgresqlcrosstab

Crosstab to show count of rows per weekday for each group


I have a database with a single table. The table includes a column called threat_group, and another called post_date. This query gives me a list of all posts for each threat_group per weekday:

SELECT
    distinct threat_group  AS "Group"
    ,extract('dow' from post_date) AS "Weekday"
    ,count(post_id) AS "Reports"

FROM
    ransomwatch_posts
WHERE 
    post_date BETWEEN '<start_date>' and '<end_date>'
group by 1,2
order by 2

Now, there are two issues from a usability/viewability perspective that I'm trying to solve for.

First, if there are no entries at all for a given threat_group on a given day, there will just be no result. For example, if the group conti does not have any posts on Sundays, there will not be any lines like this:

threat_group weekday reports
conti 0 0

I would like to get NULL (preferably 0) for the sake of building out graphs or tables.

Second, I want this data consolidated, I think through use of crosstab, but I've never built a crosstab, so it looks like this:

threat_group Sun Mon Tues Wed Thu Fri Sat
conti 0 2 4 1 12 0 0

Can someone provide some assistance here? The best attempt I've come up with was this:

SELECT * 
FROM crosstab(
$$
SELECT
    distinct threat_group  AS "Group"
    ,extract('dow' from post_date) AS "Weekday"
    ,count(post_id) AS "Reports"

FROM
    ransomwatch_posts
WHERE 
    post_date BETWEEN '2022-04-01' and '2022-06-30'
group by 1,2
order by 2
$$
) as ct("Group" TEXT, "0" numeric, "1" numeric, "2" numeric, "3" numeric, "4" numeric, "5" numeric, "6" numeric)

which fails with:

ERROR:  return and sql tuple descriptions are incompatible

Solution

  • To get NULL for missing values, you need the 2-parameter variant of crosstab():

    SELECT * 
    FROM   crosstab(
    $$
    SELECT threat_group                      -- AS grp
         , extract('isodow' FROM post_date)  -- AS weekday
         , count(*)                          -- AS reports
    FROM   ransomwatch_posts
    WHERE  post_date BETWEEN '2022-04-01' and '2022-06-30'
    GROUP  BY 1,2
    ORDER  BY 1,2
    $$
    , 'VALUES (1),(2),(3),(4),(5),(6),(7)'
    ) AS ct(grp text, mon int, tue int, wed int, thu int, fri int, sat int, sun int)
    
    

    See:

    And no DISTINCT here.

    And ORDER BY 1,2.

    And I suggest 'isodow' instead of 'dow' to get 1 - 7 for Mon - Sun.

    And the faster count(*) - unless the column post_id can be NULL (which seems unlikely).

    To get 0 instead of NULL for missing values:

    SELECT grp
         , COALESCE(mon, 0) AS mon
         , COALESCE(tue, 0) AS tue
         , COALESCE(wed, 0) AS wed
         , COALESCE(thu, 0) AS thu
         , COALESCE(fri, 0) AS fri
         , COALESCE(sat, 0) AS sat
         , COALESCE(sun, 0) AS sun
    FROM  crosstab( ...