sqlsnowflake-cloud-data-platformoverflowlistagg

How to allow LISTAGG overflow into new record in Snowflake


I want to aggregate a list of strings together, but I know this list of strings will exceed the limit. This set of strings has an identifier associated with it. So a table might look like this

id | my_strings
______________
x1 | string{1}
x1 | string{2}
.
.
.
x1 | string{N}
x2 | string{1}

What I would love for a result is the following:

id | all_strings
------------------
x1 | string{1},...,string{M}
x1 | string{M+1},...,string{N}
x2 | string{1}

So essentially, if all the strings from 1 to N for the id x1 exceed the limit, it cuts it off at the largest possible M and then makes a new record also with x1 as the id where the "all_strings" column is the rest.

Is there anyway in standard Snowflake SQL code to achieve this? I know that the strings are never identical to one another and but the length of the strings is constant.

Any help would be greatly appreciated.


Solution

  • It is possible to add subgroup to determine max number of elements per group:

    -- max 5 elements
    WITH cte AS (
      SELECT *,  CEIL(ROW_NUMBER() OVER(PARTITION BY ID ORDER BY str) / 5) AS grp
      FROM t
    )
    SELECT ID, LISTAGG(str, ',') WITHIN GROUP (ORDER BY str) AS all_strings
    FROM cte
    GROUP BY ID, grp
    ORDER BY ID, all_strings;
    

    For input:

    CREATE TABLE  t(id INT, str TEXT);
    
    INSERT INTO t(id, str)
    VALUES (1,'string01'),(1,'string02'),(1,'string03'),(1,'string04'),(1,'string05'),
    (1,'string06'),(1,'string07'),(1,'string08'),(1,'string09'),(1,'string10'),(1,'string11'),
    (2,'string12'),(2,'string13'),(2,'string14'),(2,'string15'),(2,'string16'),
    (2,'string17'),(2,'string18'),(2,'string19');
    

    Output:

    ID  ALL_STRINGS
    1   string01,string02,string03,string04,string05
    1   string06,string07,string08,string09,string10
    1   string11
    2   string12,string13,string14,string15,string16
    2   string17,string18,string19