snowflake-cloud-data-platformlistagg

How to use LISTAGG on multiple columns and remove duplicates


I am trying to use LISTAGG to create one column as a list that is the product of two columns. I am able to execute the query, but when using DISTINCT it still does not remove the duplicate values. Code sample below.

CASE WHEN PLA.SALES_CTRY_LIST IS NULL AND PLA.DP_CTRY_LIST IS NOT NULL 
             THEN PLA.DP_CTRY_LIST 
             WHEN PLA.DP_CTRY_LIST IS NULL AND PLA.SALES_CTRY_LIST IS NOT NULL 
             THEN PLA.SALES_CTRY_LIST 
             WHEN PLA.SALES_CTRY_LIST IS NOT NULL AND PLA.DP_CTRY_LIST IS NOT NULL 
             THEN LISTAGG(DISTINCT PLA.SALES_CTRY_LIST ||', '||PLA.DP_CTRY_LIST)
        END AS "TESTING"

enter image description here


Solution

  • The problem here is that you're using a concatenation statement, so that distinct is looking at unique concatenations, not the unique elements within the list.

    Maybe consider something like this:

    CASE
           WHEN PLA.SALES_CTRY_LIST IS NULL AND PLA.DP_CTRY_LIST IS NOT NULL
               THEN PLA.DP_CTRY_LIST
           WHEN PLA.DP_CTRY_LIST IS NULL AND PLA.SALES_CTRY_LIST IS NOT NULL
               THEN PLA.SALES_CTRY_LIST
           WHEN PLA.SALES_CTRY_LIST = PLA.DP_CTRY_LIST
               THEN PLA.SALES_CTRY_LIST
           WHEN PLA.SALES_CTRY_LIST IS NOT NULL AND PLA.DP_CTRY_LIST IS NOT NULL AND PLA.SALES_CTRY_LIST != PLA.DP_CTRY_LIST
               THEN PLA.SALES_CTRY_LIST || ', ' || PLA.DP_CTRY_LIST
           END AS "TESTING"
    

    This way, if the two values are the same, then you only list the one instance, and if the two values are different, you make a list of them.

    You could also simplify some of those earlier clauses using COALESCE.