I am creating a cleansing view for a ledger table where transactions are stored with hierarchical structure in the source system. I want to create an additional column 'Group_Id' identifying all the records which belong to the same transaction set.
My goal is to find the most efficient way to populate the Group_Id
column like in the following example:
Entry_No | Closed_By_Entry_No | Group_Id |
---|---|---|
1 | 4 | 1 |
2 | 4 | 1 |
3 | 4 | 1 |
4 | 5 | 1 |
5 | 0 | 1 |
6 | 9 | 2 |
7 | 9 | 2 |
8 | 9 | 2 |
9 | 11 | 2 |
10 | 11 | 2 |
11 | 0 | 2 |
12 | 14 | 3 |
13 | 14 | 3 |
14 | 0 | 3 |
15 | 16 | 4 |
16 | 0 | 4 |
17 | 0 | 5 |
18 | 20 | 6 |
19 | 20 | 6 |
20 | 22 | 6 |
21 | 22 | 6 |
22 | 0 | 6 |
Where Closed_By_Entry_No
is the Parent_Id
and Entry_No
is the Child_Id
and the Parent_Id
of the top-most Child_Id
of each set is always 0 (zero).
The challenge for me is that I don't have one hierarchy in my dataset, I have hundreds of thousand of them. Typically each set is made of two records: 1 invoice + 1 payments (70% of the records in total), but then there are more elaborated situations like in the above example where I can have e.g. 15 invoices + 3 payments + 1 credit note. And I need to identify all of those sets in a way I can treat them as a single subject for further calculations.
The only idea I have now to solve this is feeding a cursor with only
SELECT Entry_No, Closed_By_Entry_No
FROM table
WHERE Closed_By_Entry_No = 0
Scroll this cursor through the resultset sending each candidate record to a stored procedure containing the recursive CTE to read the hierarchy from the original table and populating the Group_Id
incrementally from the cursor index itself and storing the result in a temp table. Then continue from that temp table ....
There must be a simpler method that doesn't use either cursors or temporary tables.
It's a straightforward recursive common table expression:
declare @LedgerEntries as Table ( Entry_No Int, Closed_By_Entry_No Int, Group_Id Int );
insert into @LedgerEntries ( Entry_No, Closed_By_Entry_No, Group_Id ) values
( 1, 4, 1 ),
( 2, 4, 1 ),
( 3, 4, 1 ),
( 4, 5, 1 ),
( 5, 0, 1 ),
( 6, 9, 2 ),
( 7, 9, 2 ),
( 8, 9, 2 ),
( 9, 11, 2 ),
( 10, 11, 2 ),
( 11, 0, 2 ),
( 12, 14, 3 ),
( 13, 14, 3 ),
( 14, 0, 3 ),
( 15, 16, 4 ),
( 16, 0, 4 ),
( 17, 0, 5 ),
( 18, 20, 6 ),
( 19, 20, 6 ),
( 20, 22, 6 ),
( 21, 22, 6 ),
( 22, 0, 6 );
select * from @LedgerEntries;
with Hierarchy as (
-- Anchor query.
-- Start with the closed by zero ledger entries and assign a group id.
select Entry_No, Closed_By_Entry_No, Group_Id,
Row_Number( ) over ( order by Entry_No ) as NewGroupId
from @LedgerEntries
where Closed_By_Entry_No = 0
union all
-- Recursive query.
-- Add each level, working backwards, of ledger entries and propagate the new group id.
select LE.Entry_No, LE.Closed_By_Entry_No, LE.Group_Id, H.NewGroupId
from Hierarchy as H inner join
@LedgerEntries as LE on LE.Closed_By_Entry_No = H.Entry_No
)
select *
from Hierarchy
order by Entry_No;