I have a table containing incident ids. Each incident can be linked to another one because they have been stated as similar. The table contains two columns:
I would like to make groups based on these ids.
I tried to make left joins to link the ids but if an id is linked to n ids I will need to make n left joins. Imagine n=100, this would become absurd.
Here is a simplified version of my table:
incident_id | linked_incident_id |
---|---|
1 | 5 |
4 | 8 |
2 | 3 |
8 | 2 |
9 | 10 |
5 | null |
7 | 9 |
2 | 8 |
9 | 7 |
6 | 10 |
10 | null |
12 | 11 |
Here is the expected output:
incident_id | linked_incident_id | groups |
---|---|---|
1 | 5 | group 1 |
4 | 8 | group 2 |
2 | 3 | group 2 |
8 | 2 | group 2 |
9 | 10 | group 3 |
5 | null | group 1 |
7 | 9 | group 3 |
2 | 8 | group 2 |
9 | 7 | group 3 |
6 | 10 | group 3 |
10 | null | group 3 |
12 | 11 | group 4 |
13 | null | group 5 |
The actual data I am working with is bigger (approx. 25000 lines).
I am actually working with Snowflake and PostgreSQL 17, but any SQL engine would be helpful, I am missing an effective logic here. An SQL query would be ideal !
So, the problem is that incident1 is linked to incident2 which is linked to incident3 and so on and you need to make them be part of the same group. First of all, this is already possible with your data via recursive queries, here's an untested example
WITH RECURSIVE t AS (
SELECT incident_id as group_id, incident_id from incidents where linked_incident_id is null
union all
select t.group_id, incidents.incident_id from t join incidents on t.incident_id = incidents.linked_incident_id
)
SELECT
group_id,
string_agg(incident_id::text, ',')
FROM t
group by group_id;
Tried this with the sample of
create table incidents (incident_id int, linked_incident_id int);
insert into incidents(incident_id, linked_incident_id)
values
(1, null),
(2, null),
(0, null),
(3, 1),
(5, 1),
(7, 3),
(9, 3),
(10, 2);
So in this recursive query we first define the trivial records, that is, records whose linked incident is null (the root incidents) and define their incident_id as a group_id, which will later on transcend to their descendants via the second select after the union all, which finds descendants, then descendants of descendants and so on.
With this recursive logic we will end up having pairs of (group_id
, incident_id
) which are the pairs of root incidents and the current incident.
With this tool at our disposal we can simply select from this one, grouping by the group_id
and concatenating the incident_id
.
However, this is not an ideal solution. Because then you may end up with problems, when two issues link each-other and so forth. And of course, doing this kind of recursive search may not be best for performance either.
So, even though this is a solution for your current setup, I propose the following steps:
null
if the linked incident is null
and the root of the linked incident otherwisenull
root and adjust the new root incident of their descendantsThis way you will have a consistent database and you will not need recursion whenever you want to group by root incident