I think I've almost figured out a solution using the array_agg
function in Snowflake SQL, but I'm stuck on the last part.
So for example, say that family instances have a UID like family_id
, but I want to merge all members that were ever related to one another into a single array.
Here's the base table:
family_id | member_id |
---|---|
abc | 111222 |
abc | 333444 |
xyz | 111222 |
xyz | 555666 |
def | 777888 |
def | 999000 |
I've been able to get this:
family_id | member_list |
---|---|
abc | 111222,333444 |
xyz | 111222,555666 |
def | 777888,999000 |
but my desired output is this:
member_list |
---|
111222,333444,555666 |
777888,999000 |
Here's my query so far:
with family1 as (
select 111222 as member_id, 'abc' as family_id union
select 333444 as member_id, 'abc' as family_id union
select 111222 as member_id, 'xyz' as family_id union
select 555666 as member_id, 'xyz' as family_id union
select 777888 as member_id, 'def' as family_id union
select 999000 as member_id, 'def' as family_id
)
select
family_id
,array_to_string(array_agg(member_id) within group (order by member_id asc), ',') as member_list
from family1
group by family_id
Any ideas on how I can get my desired output?
Steps
CTE member_connections : member connections will identify members within the same family.
CTE initial_connections : All members within same family are connected in this
CTE recursive_groups : Propagate connections until all members are assigned to a group (this step mimics a graph traversal)
CTE final_groups : This ensures all members are assigned to the same group, preventing duplicates
CTE aggregated_members : Aggregate members by their group and generate the final list
WITH family1 AS (
SELECT 111222 as member_id, 'abc' as family_id union
SELECT 333444 as member_id, 'abc' as family_id union
SELECT 111222 as member_id, 'xyz' as family_id union
SELECT 555666 as member_id, 'xyz' as family_id union
SELECT 777888 as member_id, 'def' as family_id union
SELECT 999000 as member_id, 'def' as family_id
),
member_connections AS (
SELECT f1.member_id AS member_id_1, f2.member_id AS member_id_2
FROM family1 f1
JOIN family1 f2
ON f1.family_id = f2.family_id
AND f1.member_id != f2.member_id
),
initial_connections AS (
SELECT DISTINCT member_id_1 AS member_id, member_id_1 AS group_id
FROM member_connections
UNION ALL
SELECT DISTINCT member_id_2 AS member_id, member_id_1 AS group_id
FROM member_connections
),
recursive_groups AS (
SELECT member_id, group_id
FROM initial_connections
UNION ALL
SELECT c.member_id, g.group_id
FROM recursive_groups g
JOIN initial_connections c
ON g.group_id = c.member_id
WHERE g.member_id != c.member_id
),
final_groups AS (
SELECT DISTINCT member_id, MIN(group_id) AS group_id
FROM recursive_groups
GROUP BY member_id
),
aggregated_members AS (
SELECT group_id, array_to_string(array_agg(DISTINCT member_id) WITHIN GROUP (ORDER BY member_id), ',') AS member_list
FROM final_groups
GROUP BY group_id
)
SELECT member_list
FROM aggregated_members
ORDER BY group_id;