sqlsnowflake-cloud-data-platform

How to get every value associated with another value as with family members?


I think I've almost figured out a solution using the array_agg function in Snowflake SQL, but I'm stuck on the last part.

So for example, say that family instances have a UID like family_id, but I want to merge all members that were ever related to one another into a single array.

Here's the base table:

family_id member_id
abc 111222
abc 333444
xyz 111222
xyz 555666
def 777888
def 999000

I've been able to get this:

family_id member_list
abc 111222,333444
xyz 111222,555666
def 777888,999000

but my desired output is this:

member_list
111222,333444,555666
777888,999000

Here's my query so far:

with family1 as (
    select 111222 as member_id, 'abc' as family_id union
    select 333444 as member_id, 'abc' as family_id union
    select 111222 as member_id, 'xyz' as family_id union
    select 555666 as member_id, 'xyz' as family_id union
    select 777888 as member_id, 'def' as family_id union
    select 999000 as member_id, 'def' as family_id
)

select
   family_id
   ,array_to_string(array_agg(member_id) within group (order by member_id asc), ',') as member_list
from family1
group by family_id

Any ideas on how I can get my desired output?


Solution

  • Steps

    1. CTE member_connections : member connections will identify members within the same family.

    2. CTE initial_connections : All members within same family are connected in this

    3. CTE recursive_groups : Propagate connections until all members are assigned to a group (this step mimics a graph traversal)

    4. CTE final_groups : This ensures all members are assigned to the same group, preventing duplicates

    5. CTE aggregated_members : Aggregate members by their group and generate the final list

         WITH family1 AS (     
          SELECT 111222 as member_id, 'abc' as family_id union
          SELECT 333444 as member_id, 'abc' as family_id union
          SELECT 111222 as member_id, 'xyz' as family_id union
          SELECT 555666 as member_id, 'xyz' as family_id union
          SELECT 777888 as member_id, 'def' as family_id union
          SELECT 999000 as member_id, 'def' as family_id
      ),
      member_connections AS (
          SELECT f1.member_id AS member_id_1, f2.member_id AS member_id_2
          FROM family1 f1
          JOIN family1 f2 
              ON f1.family_id = f2.family_id 
              AND f1.member_id != f2.member_id
      ),
      initial_connections AS (
          SELECT DISTINCT member_id_1 AS member_id, member_id_1 AS group_id
          FROM member_connections
          UNION ALL
          SELECT DISTINCT member_id_2 AS member_id, member_id_1 AS group_id
          FROM member_connections
      ),
      recursive_groups AS (
          SELECT member_id, group_id
          FROM initial_connections
          UNION ALL
          SELECT c.member_id, g.group_id
          FROM recursive_groups g
          JOIN initial_connections c
              ON g.group_id = c.member_id
          WHERE g.member_id != c.member_id
      ),
      
      
      final_groups AS (
          SELECT DISTINCT member_id, MIN(group_id) AS group_id
          FROM recursive_groups
          GROUP BY member_id
      ),
      aggregated_members AS (
          SELECT group_id, array_to_string(array_agg(DISTINCT member_id) WITHIN GROUP (ORDER BY member_id), ',') AS member_list
          FROM final_groups
          GROUP BY group_id
      )
      
      SELECT member_list
      FROM aggregated_members
      ORDER BY group_id;
      

    enter image description here