I have following table called sample_events
:
Column | Type
--------+-----
title | text
date | date
with values:
title | date
-------+------------
ev1 | 2017-01-01
ev2 | 2017-01-03
ev3 | 2017-01-02
ev4 | 2017-12-10
ev5 | 2017-12-11
ev6 | 2017-07-28
In order to create a pivot table with the number of events per month in each unique year I used the crosstab function in the form crosstab(text source_sql, text category_sql)
:
SELECT * FROM crosstab (
'SELECT extract(year from date) AS year,
extract(month from date) AS month, count(*)
FROM sample_events
GROUP BY year, month'
,
'SELECT * FROM generate_series(1, 12)'
) AS (
year int, jan int, feb int, mar int,
apr int, may int, jun int, jul int,
aug int, sep int, oct int, nov int, dec int
) ORDER BY year;
Result is as follows and as expected:
year | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec
------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----
2017 | 3 | | | | | | 1 | | | | | 2
Now, I would like to create a pivot table with the number of events per day of week in each unique week of the year. I tried following query:
SELECT * FROM crosstab (
'SELECT extract(week from date) AS week,
extract(dow from date) AS day_of_week, count(*)
FROM sample_events
GROUP BY week, day_of_week'
,
'SELECT * FROM generate_series(0, 6)'
) AS (
week int, sun int, mon int, tue int,
wed int, thu int, fri int, sat int
) ORDER BY week;
Result is not as expected:
week | sun | mon | tue | wed | thu | fri | sat
------+-----+-----+-----+-----+-----+-----+-----
1 | | | 1 | | | |
1 | | 1 | | | | |
30 | | | | | | 1 |
49 | 1 | | | | | |
50 | | 1 | | | | |
52 | 1 | | | | | |
All six events are there but for whatever reason there is duplicate week value. I expected the result to be something like:
week | sun | mon | tue | wed | thu | fri | sat
------+-----+-----+-----+-----+-----+-----+-----
1 | | 1 | 1 | | | |
30 | | | | | | 1 |
49 | 1 | | | | | |
50 | | 1 | | | | |
52 | 1 | | | | | |
Questions
1) Why do results from the latter query contain duplicate key values but the former does not?
2) How to create a pivot table with unique week values?
crosstab()
expects ordered input. You need to add ORDER BY
in the input:
SELECT * FROM crosstab (
'SELECT extract(week from date)::int AS week
, extract(dow from date)::int AS day_of_week
, count(*)::int
FROM sample_events
GROUP BY week, day_of_week
ORDER BY week, day_of_week'
, 'SELECT generate_series(0, 6)'
) AS (
week int, sun int, mon int, tue int,
wed int, thu int, fri int, sat int
);
Or just ORDER BY week
.
Strictly speaking, values of the same key (week
in the example) need to be grouped (come in sequence). Keys don't have to be ordered. But the simplest and cheapest way to achieve this is ORDER BY
(which sorts keys additionally).
Or short:
SELECT * FROM crosstab (
'SELECT extract(week from date)::int
, extract(dow from date)::int
, count(*)::int
FROM sample_events
GROUP BY 1, 2
ORDER BY 1, 2' -- or just ORDER BY 1
, 'SELECT generate_series(0, 6)'
) AS ...
Your first example with months happens to work because input data has months in sequence. But this can break any time if the physical order of rows in your table changes (VACUUM
, UPDATE
, ...). You can never rely on the physical order of rows in a relational table.
See: