sqlsql-serverdistinctsql-server-2017string-aggregation

Produce DISTINCT values in STRING_AGG


I'm using the STRING_AGG function in SQL Server 2017. I'd like to create the same effect as COUNT(DISTINCT <column>). I tried STRING_AGG(DISTINCT <column>,',') but that is not legal syntax.

I'd like to know if there is a T-SQL work-around. Here is my sample:

WITH Sitings 
  AS
  (
    SELECT * FROM (VALUES 
      (1, 'Florida', 'Orlando', 'bird'),
      (2, 'Florida', 'Orlando', 'dog'),
      (3, 'Arizona', 'Phoenix', 'bird'),
      (4, 'Arizona', 'Phoenix', 'dog'),
      (5, 'Arizona', 'Phoenix', 'bird'),
      (6, 'Arizona', 'Phoenix', 'bird'),
      (7, 'Arizona', 'Phoenix', 'bird'),
      (8, 'Arizona', 'Flagstaff', 'dog')
    ) F (ID, State, City, Siting)
  ) 
SELECT State, City, COUNT(DISTINCT Siting) [# Of Types], STRING_AGG(Siting,',') Animals
FROM Sitings 
GROUP BY State, City

The above produces the following result:

+---------+-----------+--------------+-------------------------+
|  State  |   City    | # Of Types   |         Animals         |
+---------+-----------+--------------+-------------------------+
| Arizona | Flagstaff |            1 | dog                     |
| Florida | Orlando   |            2 | dog,bird                |
| Arizona | Phoenix   |            2 | bird,bird,bird,dog,bird |
+---------+-----------+--------------+-------------------------+

The output is exactly what I want, except I want the concatenated "Animals" listed for Phoenix Arizona to be DISTINCT, like this:

+---------+-----------+--------------+--------------------+
|  State  |   City    | # Of Types   |      Animals       |
+---------+-----------+--------------+--------------------+
| Arizona | Flagstaff |            1 | dog                |
| Florida | Orlando   |            2 | dog,bird           |
| Arizona | Phoenix   |            2 | bird,dog           |
+---------+-----------+--------------+--------------------+

Any ideas?

When I use my real data set, which is much larger, I get an error about the "Animals" column exceeding 8000 characters.

My question I think is the same as this one, except my example is much simpler.


Solution

  • Here is one way to do it.

    Since you want the distinct counts as well, it can be done simply by grouping the rows twice. The first GROUP BY will remove duplicates, the second GROUP BY will produce the final result.

    WITH
    Sitings
    AS
    (
        SELECT * FROM (VALUES 
        (1, 'Florida', 'Orlando', 'bird'),
        (2, 'Florida', 'Orlando', 'dog'),
        (3, 'Arizona', 'Phoenix', 'bird'),
        (4, 'Arizona', 'Phoenix', 'dog'),
        (5, 'Arizona', 'Phoenix', 'bird'),
        (6, 'Arizona', 'Phoenix', 'bird'),
        (7, 'Arizona', 'Phoenix', 'bird'),
        (8, 'Arizona', 'Flagstaff', 'dog')
        ) F (ID, State, City, Siting)
    )
    ,CTE_Animals
    AS
    (
        SELECT
            State, City, Siting
        FROM Sitings
        GROUP BY State, City, Siting
    )
    SELECT
        State, City, COUNT(1) AS [# Of Sitings], STRING_AGG(Siting,',') AS Animals
    FROM CTE_Animals
    GROUP BY State, City
    ORDER BY
        State
        ,City
    ;
    

    Result

    +---------+-----------+--------------+----------+
    |  State  |   City    | # Of Sitings | Animals  |
    +---------+-----------+--------------+----------+
    | Arizona | Flagstaff |            1 | dog      |
    | Arizona | Phoenix   |            2 | bird,dog |
    | Florida | Orlando   |            2 | bird,dog |
    +---------+-----------+--------------+----------+
    

    If you are still getting an error message about exceeding 8000 characters, then cast the values to varchar(max) before STRING_AGG.

    Something like

    STRING_AGG(CAST(Siting AS varchar(max)),',') AS Animals