sql-server t-sql grouping common-table-expression hierarchy

Identify and label each branch in a hierarchical structure

I am creating a cleansing view for a ledger table where transactions are stored with hierarchical structure in the source system. I want to create an additional column 'Group_Id' identifying all the records which belong to the same transaction set.

My goal is to find the most efficient way to populate the Group_Id column like in the following example:

Entry_No	Closed_By_Entry_No	Group_Id
1	4	1
2	4	1
3	4	1
4	5	1
5	0	1
6	9	2
7	9	2
8	9	2
9	11	2
10	11	2
11	0	2
12	14	3
13	14	3
14	0	3
15	16	4
16	0	4
17	0	5
18	20	6
19	20	6
20	22	6
21	22	6
22	0	6

Where Closed_By_Entry_No is the Parent_Id and Entry_No is the Child_Id and the Parent_Id of the top-most Child_Id of each set is always 0 (zero).

The challenge for me is that I don't have one hierarchy in my dataset, I have hundreds of thousand of them. Typically each set is made of two records: 1 invoice + 1 payments (70% of the records in total), but then there are more elaborated situations like in the above example where I can have e.g. 15 invoices + 3 payments + 1 credit note. And I need to identify all of those sets in a way I can treat them as a single subject for further calculations.

The only idea I have now to solve this is feeding a cursor with only

SELECT Entry_No, Closed_By_Entry_No 
FROM table 
WHERE Closed_By_Entry_No = 0

Scroll this cursor through the resultset sending each candidate record to a stored procedure containing the recursive CTE to read the hierarchy from the original table and populating the Group_Id incrementally from the cursor index itself and storing the result in a temp table. Then continue from that temp table ....

There must be a simpler method that doesn't use either cursors or temporary tables.

Solution

It's a straightforward recursive common table expression:

declare @LedgerEntries as Table ( Entry_No Int, Closed_By_Entry_No Int, Group_Id Int );
insert into @LedgerEntries ( Entry_No, Closed_By_Entry_No, Group_Id ) values
  ( 1, 4, 1 ),
  ( 2, 4, 1 ),
  ( 3, 4, 1 ),
  ( 4, 5, 1 ),
  ( 5, 0, 1 ),
  ( 6, 9, 2 ),
  ( 7, 9, 2 ),
  ( 8, 9, 2 ),
  ( 9, 11, 2 ),
  ( 10, 11, 2 ),
  ( 11, 0, 2 ),
  ( 12, 14, 3 ),
  ( 13, 14, 3 ),
  ( 14, 0, 3 ),
  ( 15, 16, 4 ),
  ( 16, 0, 4 ),
  ( 17, 0, 5 ),
  ( 18, 20, 6 ),
  ( 19, 20, 6 ),
  ( 20, 22, 6 ),
  ( 21,  22, 6 ),
  ( 22, 0, 6 );

select * from @LedgerEntries;

with Hierarchy as (
  -- Anchor query.
  --   Start with the closed by zero ledger entries and assign a group id.
  select Entry_No, Closed_By_Entry_No, Group_Id,
    Row_Number( ) over ( order by Entry_No ) as NewGroupId
    from @LedgerEntries
    where Closed_By_Entry_No = 0
  union all
  -- Recursive query.
  --   Add each level, working backwards, of ledger entries and propagate the new group id.
  select LE.Entry_No, LE.Closed_By_Entry_No, LE.Group_Id, H.NewGroupId
    from Hierarchy as H inner join
      @LedgerEntries as LE on LE.Closed_By_Entry_No = H.Entry_No
  )
  select *
    from Hierarchy
    order by Entry_No;

dbfiddle.

Entry_No	Closed_By_Entry_No	Group_Id
1	4	1
2	4	1
3	4	1
4	5	1
5	0	1
6	9	2
7	9	2
8	9	2
9	11	2
10	11	2
11	0	2
12	14	3
13	14	3
14	0	3
15	16	4
16	0	4
17	0	5
18	20	6
19	20	6
20	22	6
21	22	6
22	0	6

Entry_No	Closed_By_Entry_No	Group_Id
1	4	1
2	4	1
3	4	1
4	5	1
5	0	1
6	9	2
7	9	2
8	9	2
9	11	2
10	11	2
11	0	2
12	14	3
13	14	3
14	0	3
15	16	4
16	0	4
17	0	5
18	20	6
19	20	6
20	22	6
21	22	6
22	0	6

Entry_No	Closed_By_Entry_No	Group_Id
1	4	1
2	4	1
3	4	1
4	5	1
5	0	1
6	9	2
7	9	2
8	9	2
9	11	2
10	11	2
11	0	2
12	14	3
13	14	3
14	0	3
15	16	4
16	0	4
17	0	5
18	20	6
19	20	6
20	22	6
21	22	6
22	0	6