The title may not be accurate for the question but here goes! I have the following table:
id1 id2 status
1 2 a
2 3 b
3 4 c
6 7 d
7 8 e
8 9 f
9 10 g
I would like to get the first id1
and last status
based on a dynamic chain joining, meaning that the result table will be:
id final_status
1 c
6 g
Logically, I want to construct the following arrays based on joining the table to itself:
id1 chained_ids chained_status
1 [2,3,4] [a,b,c]
6 [7,8,9,10] [d,e,f,g]
Then grab the last element of the chained_status
list.
Since if we were to keep joining this table to itself on id1 = id2
we would eventually have single rows with these results. The problem is that the number of joins is not constant (a single id
may be chained many or few times). There is always a 1 to 1 mapping of id1
to id2
.
Thanks in advance! This can be done in either T-SQL or Hive (if someone has a clever map-reduce solution).
You can do this with a recursive CTE:
;WITH My_CTE AS
(
SELECT
id1,
id2,
status,
1 AS lvl
FROM
My_Table T1
WHERE
NOT EXISTS
(
SELECT *
FROM My_Table T2
WHERE T2.id2 = T1.id1
)
UNION ALL
SELECT
CTE.id1,
T3.id2,
T3.status,
CTE.lvl + 1
FROM
My_CTE CTE
INNER JOIN My_Table T3 ON T3.id1 = CTE.id2
)
SELECT
CTE.id1,
CTE.status
FROM
My_CTE CTE
INNER JOIN (SELECT id1, MAX(lvl) AS max_lvl FROM My_CTE GROUP BY id1) M ON
M.id1 = CTE.id1 AND
M.max_lvl = CTE.lvl