I want to aggregate a list of strings together, but I know this list of strings will exceed the limit. This set of strings has an identifier associated with it. So a table might look like this
id | my_strings
______________
x1 | string{1}
x1 | string{2}
.
.
.
x1 | string{N}
x2 | string{1}
What I would love for a result is the following:
id | all_strings
------------------
x1 | string{1},...,string{M}
x1 | string{M+1},...,string{N}
x2 | string{1}
So essentially, if all the strings from 1 to N for the id x1 exceed the limit, it cuts it off at the largest possible M and then makes a new record also with x1 as the id where the "all_strings" column is the rest.
Is there anyway in standard Snowflake SQL code to achieve this? I know that the strings are never identical to one another and but the length of the strings is constant.
Any help would be greatly appreciated.
It is possible to add subgroup to determine max number of elements per group:
-- max 5 elements
WITH cte AS (
SELECT *, CEIL(ROW_NUMBER() OVER(PARTITION BY ID ORDER BY str) / 5) AS grp
FROM t
)
SELECT ID, LISTAGG(str, ',') WITHIN GROUP (ORDER BY str) AS all_strings
FROM cte
GROUP BY ID, grp
ORDER BY ID, all_strings;
For input:
CREATE TABLE t(id INT, str TEXT);
INSERT INTO t(id, str)
VALUES (1,'string01'),(1,'string02'),(1,'string03'),(1,'string04'),(1,'string05'),
(1,'string06'),(1,'string07'),(1,'string08'),(1,'string09'),(1,'string10'),(1,'string11'),
(2,'string12'),(2,'string13'),(2,'string14'),(2,'string15'),(2,'string16'),
(2,'string17'),(2,'string18'),(2,'string19');
Output:
ID ALL_STRINGS
1 string01,string02,string03,string04,string05
1 string06,string07,string08,string09,string10
1 string11
2 string12,string13,string14,string15,string16
2 string17,string18,string19