sqlpostgresqlpivotcasecrosstab

Sum by month and put months as columns


Background

I have time series data on a monthly basis and I would like to sum values for each ID, grouped by month and then have the month names as columns rather than as rows.

Example

+----+------------+-------+-------+
| id | extra_info | month | value |
+----+------------+-------+-------+
| 1  | abc        | jan   | 10    |
| 1  | abc        | feb   | 20    |
| 2  | def        | jan   | 10    |
| 2  | def        | feb   | 5     |
| 1  | abc        | jan   | 15    |
| 3  | ghi        | mar   | 15    |

Desired Result

+----+------------+-----+-----+-----+
| id | extra_info | jan | feb | mar |
+----+------------+-----+-----+-----+
| 1  | abc        | 25  | 20  | 0   |
| 2  | def        | 10  | 5   | 0   |
| 3  | ghi        | 0   | 0   | 15  |

Current Approach

I can easily group by month, summing the values. Which gets me to:

-----------------------------------
| id | extra_info | month | value |
+----+------------+-------+-------+
| 1  | abc        | jan   | 25    |
| 1  | abc        | feb   | 20    |
| 2  | def        | jan   | 10    |
| 2  | def        | feb   | 5     |
| 3  | ghi        | mar   | 15    |

But I now need those months as column names. Not sure where to go from here.

Additional Information

Any ideas from an SQL guru very much appreciated!


Solution

  • Setup

    CREATE TABLE tbl (
      id int
    , extra_info varchar(3)
    , month date
    , value int
    );
       
    INSERT INTO tbl VALUES
      (1, 'abc', '2012-01-01', 10)
    , (1, 'abc', '2012-02-01', 20)
    , (2, 'def', '2012-01-01', 10)
    , (2, 'def', '2012-02-01',  5)
    , (1, 'abc', '2012-01-01', 15)
    , (3, 'ghi', '2012-03-01', 15)
    ;
    

    crosstab()

    I would use crosstab() from the additional tablefunc module. Install once per database with:

    CREATE EXTENSION tablefunc;
    

    Basics:

    How to deal with "extra" columns:

    Advanced usage:

    Query

    SELECT * FROM crosstab(
       $$
       SELECT id, min(extra_info), month, sum(value) AS value
       FROM   tbl
       GROUP  BY id, month
       ORDER  BY id, month
       $$
     , $$
       VALUES
         ('jan'::text), ('feb'), ('mar'), ('apr'), ('may'), ('jun')
       , ('jul'),       ('aug'), ('sep'), ('oct'), ('nov'), ('dec')
       $$
       ) AS ct (id  int, extra text
              , jan int, feb int, mar int, apr int, may int, jun int
              , jul int, aug int, sep int, oct int, nov int, dec int);
    

    Obviously, you can only output one extra_info per id. I pick min(extra_info) since you didn't specify. If all are the same per id, you could also group by it additionally.

    Result:

     id | extra | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec
    ----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----
      1 | abc   |  25 |  20 |     |     |     |     |     |     |     |     |     |
      2 | def   |  10 |   5 |     |     |     |     |     |     |     |     |     |
      3 | ghi   |     |     |  15 |     |     |     |     |     |     |     |     |
    

    db<>fiddle here

    Installing the tablefunc module (once per database) incurs some overhead, but queries are typically faster and shorter.

    Pure SQL

    If you can't or won't install the additional module, plain SQL got s bit faster with the aggregate FILTER clause added with Postgres 9.4. See:

    SELECT id, min(extra_info) AS extra
         , sum(value) FILTER (WHERE month = 'jan') AS jan
         , sum(value) FILTER (WHERE month = 'feb') AS feb
         , sum(value) FILTER (WHERE month = 'mar') AS mar
         , sum(value) FILTER (WHERE month = 'apr') AS apr
         , sum(value) FILTER (WHERE month = 'may') AS may
         , sum(value) FILTER (WHERE month = 'jun') AS jun
         , sum(value) FILTER (WHERE month = 'jul') AS jul
         , sum(value) FILTER (WHERE month = 'aug') AS aug
         , sum(value) FILTER (WHERE month = 'sep') AS sep
         , sum(value) FILTER (WHERE month = 'oct') AS oct
         , sum(value) FILTER (WHERE month = 'nov') AS nov
         , sum(value) FILTER (WHERE month = 'dec') AS dec
    FROM   tbl
    GROUP  BY id
    ORDER  BY id;
    

    0 instead of NULL

    To output 0 instead of NULL for missing values, use COALESCE for either query:

    SELECT id, extra
         , COALESCE(jan, 0) AS jan
         , COALESCE(feb, 0) AS feb
         , COALESCE(mar, 0) AS mar
         , COALESCE(apr, 0) AS apr
         , COALESCE(may, 0) AS may
         , COALESCE(jun, 0) AS jun
         , COALESCE(jul, 0) AS jul
         , COALESCE(aug, 0) AS aug
         , COALESCE(sep, 0) AS sep
         , COALESCE(oct, 0) AS oct
         , COALESCE(nov, 0) AS nov
         , COALESCE(dec, 0) AS dec
    FROM  (<query from above>)
    

    db<>fiddle here