sqlsnowflake-cloud-data-platform

Snowflake SQL - Group by in order


How can I achieve this in Snowflake SQL?

I want to group by ID column and maintain same order for the field values. Also if all values are same for columns: currency or UOM, I just want to list them once.

ID Sub_ID Value Currency QTY UOM
1001 1 10 USD 1 cm
1001 2 200 USD 100 cm
1001 3 90 USD 10 cm

Desired result:

ID Sub_ID Value Currency QTY UOM
1001 1,2,3 10,200,90 USD 1,100,10 cm

Solution

  • To get all aggregated columns in the same order, you will listagg(…) within group (order by Sub_ID), so they all have the same order.

    Then to know if we have only identical values in a column (that we will want to merge), you can rely on aggregated columns with the same values having the same min() and max().
    N.B.: if you have some nulls in the column, but all other values are identical, the nulls will get ignored and you'll end up with the identical value as the "merged" only value for the column (because min(x, x, null) = max(null, null, x) = x).

    For the merge operation itself, if the min() = max() condition returns true, you can naturally choose any one of min() or max() to obtain the value.

    select
      ID,
      case when min(Sub_ID) = max(Sub_ID) then min(Sub_ID) else listagg(Sub_ID, ',') within group (order by Sub_ID) end Sub_ID,
      case when min(Value) = max(Value) then min(Value) else listagg(Value, ',') within group (order by Sub_ID) end Value,
      case when min(Currency) = max(Currency) then min(Currency) else listagg(Currency, ',') within group (order by Sub_ID) end Currency,
      case when min(QTY) = max(QTY) then min(QTY) else listagg(QTY, ',') within group (order by Sub_ID) end QTY,
      case when min(UOM) = max(UOM) then min(UOM) else listagg(UOM, ',') within group (order by Sub_ID) end UOM
    from test
    group by ID;
    

    This is theorical code, composed by analogy to a working fiddle in PostgreSQL or one in SQL Server

    And note that I used only varchar columns, so that the min() as well as the listagg() are varchars; if you have integers you should probably cast each min() to varchar to avoid a type conflict between the left and right part of the case.

    id sub_id value currency qty uom
    1001 1,2,3 10,200,90 USD 1,100,10 cm