[SOLVED] Flatten parent child hierarchy with multiple parents

Flatten parent child hierarchy with multiple parents

I have a parent-child hierarchy in my source structure where a child could point to his parent and for his parent could exist multiple rows. If we than flatten the hierarchy this would mean that every child row need to exist beneath his parent row.

Image below to clarify

I have already picked out my brains how to resolve this in a performant manner in T-SQL because my source set is 300K rows and this will lead to heavy processing. Help is greatly appreciated!

Code to directly start from the example above see below.

SELECT *
INTO Orders
FROM (SELECT CAST('A' AS NVARCHAR(1)) AS Id, CAST('' AS NVARCHAR(1)) AS ParentId, CAST(10 AS int) AS Qty UNION ALL
SELECT CAST('A' AS NVARCHAR(1)) AS Id, CAST('' AS NVARCHAR(1)) AS ParentId, CAST(20 AS int) AS Qty UNION ALL
SELECT CAST('B' AS NVARCHAR(1)) AS Id, CAST('A' AS NVARCHAR(1)) AS ParentId, CAST(30 AS int) AS Qty UNION ALL
SELECT CAST('B' AS NVARCHAR(1)) AS Id, CAST('A' AS NVARCHAR(1)) AS ParentId, CAST(40 AS int) AS Qty UNION ALL
SELECT CAST('C' AS NVARCHAR(1)) AS Id, CAST('B' AS NVARCHAR(1)) AS ParentId, CAST(50 AS int) AS Qty UNION ALL
SELECT CAST('C' AS NVARCHAR(1)) AS Id, CAST('B' AS NVARCHAR(1)) AS ParentId, CAST(60 AS int) AS Qty UNION ALL
SELECT CAST('C' AS NVARCHAR(1)) AS Id, CAST('B' AS NVARCHAR(1)) AS ParentId, CAST(70 AS int) AS Qty) AS T1

Solution

I have no idea why you would want to do this, but this is how I would do it...

SELECT
  L1.ID,
  L2.ID,
  L3.ID,
  COALESCE(L3.Qty, L2.Qty, L1.Qty)
FROM
  yourTable   AS L1
OUTER APPLY
(
  SELECT * FROM yourTable WHERE parentID = L1.ID
  UNION ALL
  SELECT NULL, NULL, NULL
)
  AS L2
OUTER APPLY
(
  SELECT * FROM yourTable WHERE parentID = L2.ID
  UNION ALL
  SELECT NULL, NULL, NULL
)
  AS L3
WHERE
  L1.ParentID IS NULL

You're going to get a LOT of rows though. Just in your example 8 input rows became 18 output rows. 300k input rows could easily become many millions of output rows, depending on the level of duplication in the data.