Sum by month and put months as columns

Background

I have time series data on a monthly basis and I would like to sum values for each ID, grouped by month and then have the month names as columns rather than as rows.

Example

+----+------------+-------+-------+
| id | extra_info | month | value |
+----+------------+-------+-------+
| 1  | abc        | jan   | 10    |
| 1  | abc        | feb   | 20    |
| 2  | def        | jan   | 10    |
| 2  | def        | feb   | 5     |
| 1  | abc        | jan   | 15    |
| 3  | ghi        | mar   | 15    |

Desired Result

+----+------------+-----+-----+-----+
| id | extra_info | jan | feb | mar |
+----+------------+-----+-----+-----+
| 1  | abc        | 25  | 20  | 0   |
| 2  | def        | 10  | 5   | 0   |
| 3  | ghi        | 0   | 0   | 15  |

Current Approach

I can easily group by month, summing the values. Which gets me to:

-----------------------------------
| id | extra_info | month | value |
+----+------------+-------+-------+
| 1  | abc        | jan   | 25    |
| 1  | abc        | feb   | 20    |
| 2  | def        | jan   | 10    |
| 2  | def        | feb   | 5     |
| 3  | ghi        | mar   | 15    |

But I now need those months as column names. Not sure where to go from here.

Additional Information

In terms of language, this query is to be run in postgres.
The months above are just examples, obviously the real data set is much larger and covers all 12 months across thousands of IDs

Any ideas from an SQL guru very much appreciated!

Solution

Setup

CREATE TABLE tbl (
  id int
, extra_info varchar(3)
, month date
, value int
);
   
INSERT INTO tbl VALUES
  (1, 'abc', '2012-01-01', 10)
, (1, 'abc', '2012-02-01', 20)
, (2, 'def', '2012-01-01', 10)
, (2, 'def', '2012-02-01',  5)
, (1, 'abc', '2012-01-01', 15)
, (3, 'ghi', '2012-03-01', 15)
;

`crosstab()`

I would use crosstab() from the additional tablefunc module. Install once per database with:

CREATE EXTENSION tablefunc;

Basics:

PostgreSQL Crosstab Query

How to deal with "extra" columns:

Pivot on Multiple Columns using Tablefunc

Advanced usage:

Dynamic alternative to pivot with CASE and GROUP BY

Query

SELECT * FROM crosstab(
   $$
   SELECT id, min(extra_info), month, sum(value) AS value
   FROM   tbl
   GROUP  BY id, month
   ORDER  BY id, month
   $$
 , $$
   VALUES
     ('jan'::text), ('feb'), ('mar'), ('apr'), ('may'), ('jun')
   , ('jul'),       ('aug'), ('sep'), ('oct'), ('nov'), ('dec')
   $$
   ) AS ct (id  int, extra text
          , jan int, feb int, mar int, apr int, may int, jun int
          , jul int, aug int, sep int, oct int, nov int, dec int);

Obviously, you can only output one extra_info per id. I pick min(extra_info) since you didn't specify. If all are the same per id, you could also group by it additionally.

Result:

 id | extra | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec
----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----
  1 | abc   |  25 |  20 |     |     |     |     |     |     |     |     |     |
  2 | def   |  10 |   5 |     |     |     |     |     |     |     |     |     |
  3 | ghi   |     |     |  15 |     |     |     |     |     |     |     |     |

db<>fiddle here

Installing the tablefunc module (once per database) incurs some overhead, but queries are typically faster and shorter.

Pure SQL

If you can't or won't install the additional module, plain SQL got s bit faster with the aggregate FILTER clause added with Postgres 9.4. See:

Aggregate columns with additional (distinct) filters

SELECT id, min(extra_info) AS extra
     , sum(value) FILTER (WHERE month = 'jan') AS jan
     , sum(value) FILTER (WHERE month = 'feb') AS feb
     , sum(value) FILTER (WHERE month = 'mar') AS mar
     , sum(value) FILTER (WHERE month = 'apr') AS apr
     , sum(value) FILTER (WHERE month = 'may') AS may
     , sum(value) FILTER (WHERE month = 'jun') AS jun
     , sum(value) FILTER (WHERE month = 'jul') AS jul
     , sum(value) FILTER (WHERE month = 'aug') AS aug
     , sum(value) FILTER (WHERE month = 'sep') AS sep
     , sum(value) FILTER (WHERE month = 'oct') AS oct
     , sum(value) FILTER (WHERE month = 'nov') AS nov
     , sum(value) FILTER (WHERE month = 'dec') AS dec
FROM   tbl
GROUP  BY id
ORDER  BY id;

`0` instead of `NULL`

To output 0 instead of NULL for missing values, use COALESCE for either query:

SELECT id, extra
     , COALESCE(jan, 0) AS jan
     , COALESCE(feb, 0) AS feb
     , COALESCE(mar, 0) AS mar
     , COALESCE(apr, 0) AS apr
     , COALESCE(may, 0) AS may
     , COALESCE(jun, 0) AS jun
     , COALESCE(jul, 0) AS jul
     , COALESCE(aug, 0) AS aug
     , COALESCE(sep, 0) AS sep
     , COALESCE(oct, 0) AS oct
     , COALESCE(nov, 0) AS nov
     , COALESCE(dec, 0) AS dec
FROM  (<query from above>)

db<>fiddle here