sqlpostgresqlgroup-bystring-aggregation

How to concatenate strings of a string field in a PostgreSQL 'group by' query?


I am looking for a way to concatenate the strings of a field within a group by query. So for example, I have a table:

ID COMPANY_ID EMPLOYEE
1 1 Anna
2 1 Bill
3 2 Carol
4 2 Dave

and I wanted to group by company_id to get something like:

COMPANY_ID EMPLOYEE
1 Anna, Bill
2 Carol, Dave

There is a built-in function in mySQL to do this group_concat


Solution

  • PostgreSQL 9.0 or later:

    Modern Postgres (since 2010) has the string_agg(expression, delimiter) function which will do exactly what the asker was looking for:

    SELECT company_id, string_agg(employee, ', ')
    FROM mytable
    GROUP BY company_id;
    

    Postgres 9 also added the ability to specify an ORDER BY clause in any aggregate expression; otherwise you have to order all your results or deal with an undefined order. So you can now write:

    SELECT company_id, string_agg(employee, ', ' ORDER BY employee)
    FROM mytable
    GROUP BY company_id;
    

    PostgreSQL 8.4.x:

    Please note that support for Postgres 8.4 ended in 2014, so you should probably upgrade for more important reasons than string aggregation.

    PostgreSQL 8.4 (in 2009) introduced the aggregate function array_agg(expression) which collects the values in an array. Then array_to_string() can be used to give the desired result:

    SELECT company_id, array_to_string(array_agg(employee), ', ')
    FROM mytable
    GROUP BY company_id;
    

    PostgreSQL 8.3.x and older:

    When this question was originally posed, there was no built-in aggregate function to concatenate strings. The simplest custom implementation (suggested by Vajda Gabo in this mailing list post, among many others) is to use the built-in textcat function:

    CREATE AGGREGATE textcat_all(
      basetype    = text,
      sfunc       = textcat,
      stype       = text,
      initcond    = ''
    );
    

    Here is the CREATE AGGREGATE documentation.

    This simply glues all the strings together, with no separator. In order to get a ", " inserted in between them without having it at the end, you might want to make your own concatenation function and substitute it for the "textcat" above. Here is one I put together and tested on 8.3.12:

    CREATE FUNCTION commacat(acc text, instr text) RETURNS text AS $$
      BEGIN
        IF acc IS NULL OR acc = '' THEN
          RETURN instr;
        ELSE
          RETURN acc || ', ' || instr;
        END IF;
      END;
    $$ LANGUAGE plpgsql;
    

    This version will output a comma even if the value in the row is null or empty, so you get output like this:

    a, b, c, , e, , g
    

    If you would prefer to remove extra commas to output this:

    a, b, c, e, g
    

    Then add an ELSIF check to the function like this:

    CREATE FUNCTION commacat_ignore_nulls(acc text, instr text) RETURNS text AS $$
      BEGIN
        IF acc IS NULL OR acc = '' THEN
          RETURN instr;
        ELSIF instr IS NULL OR instr = '' THEN
          RETURN acc;
        ELSE
          RETURN acc || ', ' || instr;
        END IF;
      END;
    $$ LANGUAGE plpgsql;