sqlsnowflake-cloud-data-platformtruncatelistagg

Avoid "String is too long" Warning with LISTAGG in Snowflake


I have been using the LISTAGG function in Snowflake to concatenate strings and I triggered the following warning:

100078 (22000): String '(LISTAGG result)' is too long and would be truncated

I understand that this warning is triggered when the aggregated string exceeds a certain length. I want to know the best practices to prevent or handle this warning, given that truncation is fine and not relevant for the quality of column. Should I truncate in advance the result? If so, how?

SELECT
        userid,
        NULLIF(LISTAGG(DISTINCT city, ', '), '') AS cities,
        NULLIF(LISTAGG(DISTINCT region, ', '), '') AS regions,
        ...
FROM {{ ref('myschema.table_T') }}
GROUP BY userid

Solution

  • Since the aggregates string reach the limit, you can't use the LISTAGG function. You could create a user-defined aggregate function instead :

    create or replace function full_array_agg(g string, s string)
    returns table (G string, S array)
    language javascript
    as $$
    {
        processRow: function f(row, rowWriter, context){
            if( this.arr.indexOf(row.S) === -1 ) {
              this.arr.push(row.S)
            }
            this.group = row.G
            this.counter++;
        }
        , initialize: function(argumentInfo, context) {
            this.counter = 0;
            this.arr = [];
        }, finalize: function(rowWriter, context){
            rowWriter.writeRow({G:this.group, S: this.arr})
        }
    }
    $$;
    

    You can use it like this :

    select cities.g as userid, cities.s as cities
    from mytable
        , table(full_array_agg(
            userid::string, 
            city) over(partition by userid)) cities;
    

    Inspired from this answer :