Suppose I have a hierarchical structure like this.
Father ID | Child ID |
---|---|
1 | 2 |
1 | 3 |
1 | 7 |
3 | 4 |
3 | 5 |
3 | 6 |
In my case, those are production orders, that demand from (father) or supply to (child) each other. I need to bring those orders to a structure like this:
Master Father | Hierarchy | Child |
---|---|---|
1 | 001.001 | 2 |
1 | 001.002 | 3 |
1 | 001.002.001 | 4 |
1 | 001.002.002 | 5 |
1 | 001.002.003 | 6 |
1 | 001.003 | 7 |
Afterwards I have to join the components to the child orders, that were used to create the master father. The target is, to trace the origin of the components, that were used to create a certain product.
The big problem is, that I can't predict, how deep those tree structures will become in the future. But I want to create a report, that won't break, when they get deeper.
So far I counted the depth manually. If the Master Father is 0 then I have 2 Levels in my example case. The (pseudo) code for the transformation would look like this:
Select
L.Father_ID as Level0
, '001' as Hierarchy0
, L.Child_ID as Level1
, Dense_Rank() over (partition by L.Father_ID) as Hierarchy1
, R1.Child_ID as Level2
, Dense_Rank() over (partition by L.Father_ID, L.Child_ID) as Hierarchy2
Into #exploded_table
From Table as L
Left Join Table as R1
on L.Child_ID = R1.Father_ID
Select distinct
Level0 as Master_Father
, Concat(Hierarchy0,'.',format(Hirarchy1, '000')) as Hierarchy
, Level1 as Child
From #exploded_table
Union all
Select distinct
Level0 as Master_Father
, Concat(Hierarchy0,'.',format(Hirarchy1, '000'),'.',format(Hirarchy2, '000')) as Hierarchy
, Level2 as Child
From #exploded_table
I have two problems with this code.
Therefore I started to write dynamic code. It first counts the depth of the deepest tree and then it dynamically creates the code with as much levels as are needed.
When you execute this code the amount of columns is (as far as I understand it) called "Non-Deterministic". And MS SQL hates "Non-Deterministic" things. To be allowed to store this as a temp table I have to create a temp table outside of the scope of the EXEC sp_executesql
function. And then I have to dynamically modify the columns of this table to fit exactly to the result of the dynamic SQL statement that I pass to the Exec function. I won't use global temp tables since this will lead to chaos when multiple reports have the same names for their temp tables.
The above described method is complicated and unreadable! I can do it, but it feels like the opposite of good practice.
So I want to ask the community whether this can't be done in a simpler way. I came from a company that used PySpark where this was easily doable and now I am here and all I have is SQL Server and the SQL Server Reporting Services.
I believe you are mostly along the right track. However, as you have defined columns (e.g., Master_Father_ID, Hierarchy, and Child_ID) you don't need dynamic SQL.
Rather than a recursive CTE, I've written the below as a WHILE loop to help understand what's happening. It's easy enough to convert to a recusive CTE but I think (especially to many people familiar with other programming languages) that a WHILE loop is more familiar.
The logic of the below is as follows
Then the final report needs to simply ignore the initial Master_Father rows inserted (e.g., what would be 001, 002 etc in your table above).
The SQL below can also be found in this db<>fiddle - noting that I have added extra data for demonstration purposes.
Initial data
CREATE TABLE #exploded_table (
Father_ID int,
Child_ID int,
PRIMARY KEY (Father_ID, Child_ID)
);
INSERT INTO #exploded_table (Father_ID, Child_ID) VALUES
(1, 2), (1, 3), (1, 7), (3, 4), (3, 5), (3, 6);
Now for the working table - I've called it #summary_table. Current_ID represents the current Child_ID for that row (e.g., at the deepest level of the hierarchy). insert_round is the round number (loop iteration) that the row was inserted.
I also have two variables: @round that indicates which round/cycle we're up to, and @n indicating the number of rows inserted in the current round.
CREATE TABLE #summary_table (
Master_Father_ID int,
Current_ID int,
Hierarchy nvarchar(500),
insert_round int
);
DECLARE @round int = 1;
DECLARE @n int = 0;
Step 1: Insert master fathers
Master fathers are those that are not themselves children e.g., Father_Id 1 is not a child, so it is included; Father_ID 3 is a child, therefore isn't a master father.
Note that I'm using RIGHT(N'000' + LTRIM(STR(int value)), 3)
to convert any integer number to 3-digit format as a string.
INSERT INTO #summary_table (Master_Father_ID, Current_ID, Hierarchy, insert_round)
SELECT DISTINCT Father_ID,
Father_ID,
RIGHT(N'000' + LTRIM(STR(DENSE_RANK() OVER (ORDER BY Father_ID))), 3),
@round
FROM #exploded_table
WHERE Father_ID NOT IN (SELECT Child_ID FROM #exploded_table);
SET @n = @@ROWCOUNT;
Step 2: Find any children of the last round and insert their data into the working table
(Also Step 3: Repeat until no more is happening)
Taking the data from the last round of insertions (identified by #summary_table.insert_round), identify any children and insert their data into the working table.
Keep doing this until you don't insert any more rows.
WHILE @n > 0
BEGIN
INSERT INTO #summary_table (Master_Father_ID, Current_ID, Hierarchy, insert_round)
SELECT DISTINCT
#summary_table.Master_Father_ID,
#exploded_table.Child_ID,
#summary_table.Hierarchy + N'.' + RIGHT(N'000' + LTRIM(STR(DENSE_RANK() OVER (PARTITION BY #summary_table.Master_Father_ID, #summary_table.Current_ID ORDER BY #exploded_table.Child_ID))), 3),
@round + 1
FROM #summary_table
INNER JOIN #exploded_table ON #summary_table.Current_ID = #exploded_table.Father_ID
WHERE #summary_table.insert_round = @round;
SET @n = @@ROWCOUNT;
SET @round += 1;
END;
Then the final report
SELECT Master_Father_ID, Hierarchy, Current_ID AS Child_ID
FROM #summary_table
WHERE insert_round > 1
ORDER BY Hierarchy;
Data is as follows
Master_Father_ID Hierarchy Child_ID
1 001.001 2
1 001.002 3
1 001.002.001 4
1 001.002.002 5
1 001.002.003 6
1 001.003 7
Note - this assumes your original data is set up properly and has no loops/etc (e.g., if Kyle Reese was John Connor's child as well as his father... err sorry, 'spolier alert'). If you have those, you need to add extra checks - and those checks will depend on how you actually want to deal with those loops.