I have time series data on a monthly basis and I would like to sum values for each ID, grouped by month and then have the month names as columns rather than as rows.
+----+------------+-------+-------+
| id | extra_info | month | value |
+----+------------+-------+-------+
| 1 | abc | jan | 10 |
| 1 | abc | feb | 20 |
| 2 | def | jan | 10 |
| 2 | def | feb | 5 |
| 1 | abc | jan | 15 |
| 3 | ghi | mar | 15 |
+----+------------+-----+-----+-----+
| id | extra_info | jan | feb | mar |
+----+------------+-----+-----+-----+
| 1 | abc | 25 | 20 | 0 |
| 2 | def | 10 | 5 | 0 |
| 3 | ghi | 0 | 0 | 15 |
I can easily group by month, summing the values. Which gets me to:
-----------------------------------
| id | extra_info | month | value |
+----+------------+-------+-------+
| 1 | abc | jan | 25 |
| 1 | abc | feb | 20 |
| 2 | def | jan | 10 |
| 2 | def | feb | 5 |
| 3 | ghi | mar | 15 |
But I now need those months as column names. Not sure where to go from here.
Any ideas from an SQL guru very much appreciated!
CREATE TABLE tbl (
id int
, extra_info varchar(3)
, month date
, value int
);
INSERT INTO tbl VALUES
(1, 'abc', '2012-01-01', 10)
, (1, 'abc', '2012-02-01', 20)
, (2, 'def', '2012-01-01', 10)
, (2, 'def', '2012-02-01', 5)
, (1, 'abc', '2012-01-01', 15)
, (3, 'ghi', '2012-03-01', 15)
;
crosstab()
I would use crosstab()
from the additional tablefunc
module. Install once per database with:
CREATE EXTENSION tablefunc;
Basics:
How to deal with "extra" columns:
Advanced usage:
SELECT * FROM crosstab(
$$
SELECT id, min(extra_info), month, sum(value) AS value
FROM tbl
GROUP BY id, month
ORDER BY id, month
$$
, $$
VALUES
('jan'::text), ('feb'), ('mar'), ('apr'), ('may'), ('jun')
, ('jul'), ('aug'), ('sep'), ('oct'), ('nov'), ('dec')
$$
) AS ct (id int, extra text
, jan int, feb int, mar int, apr int, may int, jun int
, jul int, aug int, sep int, oct int, nov int, dec int);
Obviously, you can only output one extra_info
per id
. I pick
min(extra_info)
since you didn't specify. If all are the same per id
, you could also group by it additionally.
Result:
id | extra | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec
----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----
1 | abc | 25 | 20 | | | | | | | | | |
2 | def | 10 | 5 | | | | | | | | | |
3 | ghi | | | 15 | | | | | | | | |
db<>fiddle here
Installing the tablefunc
module (once per database) incurs some overhead, but queries are typically faster and shorter.
If you can't or won't install the additional module, plain SQL got s bit faster with the aggregate FILTER
clause added with Postgres 9.4. See:
SELECT id, min(extra_info) AS extra
, sum(value) FILTER (WHERE month = 'jan') AS jan
, sum(value) FILTER (WHERE month = 'feb') AS feb
, sum(value) FILTER (WHERE month = 'mar') AS mar
, sum(value) FILTER (WHERE month = 'apr') AS apr
, sum(value) FILTER (WHERE month = 'may') AS may
, sum(value) FILTER (WHERE month = 'jun') AS jun
, sum(value) FILTER (WHERE month = 'jul') AS jul
, sum(value) FILTER (WHERE month = 'aug') AS aug
, sum(value) FILTER (WHERE month = 'sep') AS sep
, sum(value) FILTER (WHERE month = 'oct') AS oct
, sum(value) FILTER (WHERE month = 'nov') AS nov
, sum(value) FILTER (WHERE month = 'dec') AS dec
FROM tbl
GROUP BY id
ORDER BY id;
0
instead of NULL
To output 0 instead of NULL for missing values, use COALESCE
for either query:
SELECT id, extra
, COALESCE(jan, 0) AS jan
, COALESCE(feb, 0) AS feb
, COALESCE(mar, 0) AS mar
, COALESCE(apr, 0) AS apr
, COALESCE(may, 0) AS may
, COALESCE(jun, 0) AS jun
, COALESCE(jul, 0) AS jul
, COALESCE(aug, 0) AS aug
, COALESCE(sep, 0) AS sep
, COALESCE(oct, 0) AS oct
, COALESCE(nov, 0) AS nov
, COALESCE(dec, 0) AS dec
FROM (<query from above>)
db<>fiddle here