postgresqldatecrosstab

Why does this crosstab() query return duplicate keys?


I have following table called sample_events:

 Column | Type
--------+-----
 title  | text
 date   | date

with values:

 title |     date
-------+------------
 ev1   | 2017-01-01
 ev2   | 2017-01-03
 ev3   | 2017-01-02
 ev4   | 2017-12-10
 ev5   | 2017-12-11
 ev6   | 2017-07-28

In order to create a pivot table with the number of events per month in each unique year I used the crosstab function in the form crosstab(text source_sql, text category_sql):

SELECT * FROM crosstab (
   'SELECT extract(year from date) AS year,
        extract(month from date) AS month, count(*)
    FROM sample_events
    GROUP BY year, month'
,
   'SELECT * FROM generate_series(1, 12)'
) AS (
    year int, jan int, feb int, mar int,
    apr int, may int, jun int, jul int,
    aug int, sep int, oct int, nov int, dec int
) ORDER BY year;

Result is as follows and as expected:

 year | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec
------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----
 2017 |   3 |     |     |     |     |     |   1 |     |     |     |     |   2

Now, I would like to create a pivot table with the number of events per day of week in each unique week of the year. I tried following query:

SELECT * FROM crosstab (
   'SELECT extract(week from date) AS week,
        extract(dow from date) AS day_of_week, count(*)
    FROM sample_events
    GROUP BY week, day_of_week'
,
   'SELECT * FROM generate_series(0, 6)'
) AS (
    week int, sun int, mon int, tue int,
    wed int, thu int, fri int, sat int
) ORDER BY week;

Result is not as expected:

 week | sun | mon | tue | wed | thu | fri | sat 
------+-----+-----+-----+-----+-----+-----+-----
    1 |     |     |   1 |     |     |     |    
    1 |     |   1 |     |     |     |     |    
   30 |     |     |     |     |     |   1 |    
   49 |   1 |     |     |     |     |     |    
   50 |     |   1 |     |     |     |     |    
   52 |   1 |     |     |     |     |     |    

All six events are there but for whatever reason there is duplicate week value. I expected the result to be something like:

 week | sun | mon | tue | wed | thu | fri | sat 
------+-----+-----+-----+-----+-----+-----+-----
    1 |     |   1 |   1 |     |     |     |    
   30 |     |     |     |     |     |   1 |    
   49 |   1 |     |     |     |     |     |    
   50 |     |   1 |     |     |     |     |    
   52 |   1 |     |     |     |     |     |    

Questions

1) Why do results from the latter query contain duplicate key values but the former does not?

2) How to create a pivot table with unique week values?


Solution

  • crosstab() expects ordered input. You need to add ORDER BY in the input:

    SELECT * FROM crosstab (
       'SELECT extract(week from date)::int AS week
             , extract(dow  from date)::int AS day_of_week
             , count(*)::int
        FROM   sample_events
        GROUP  BY week, day_of_week
        ORDER  BY week, day_of_week'
     , 'SELECT generate_series(0, 6)'
       ) AS (
        week int, sun int, mon int, tue int,
        wed int, thu int, fri int, sat int
    );
    

    Or just ORDER BY week.

    Strictly speaking, values of the same key (week in the example) need to be grouped (come in sequence). Keys don't have to be ordered. But the simplest and cheapest way to achieve this is ORDER BY (which sorts keys additionally).

    Or short:

    SELECT * FROM crosstab (
       'SELECT extract(week from date)::int
             , extract(dow  from date)::int
             , count(*)::int
        FROM   sample_events
        GROUP  BY 1, 2
        ORDER  BY 1, 2'  -- or just ORDER BY 1
     ,  'SELECT generate_series(0, 6)'
    ) AS ...
    

    Your first example with months happens to work because input data has months in sequence. But this can break any time if the physical order of rows in your table changes (VACUUM, UPDATE, ...). You can never rely on the physical order of rows in a relational table.

    See: