sqlsql-serverrecursion

SQL Server recursion generating full nested membership


I am able to generate a recursion query thanks to this post here: SQL Server recursion with multiple tables

But now I'm trying to get what a full listing of all the parent/child relationships in for multiple levels. Here is the example:

create table groups_groups (
    group_id int not null,
    child_id int not null
)

create table groups(group_id int not null)

insert into  groups_groups values (78,80)
insert into  groups_groups values (80,79)
insert into groups values (78)
insert into groups values (70)
insert into groups values (80)
;

with cte as (
    select
        0 as gglevel,
        0 as parentid,
        groups.group_id ,
        convert(varchar(max), concat(',', 0,':',groups.group_id)) as visited
    from groups
    union all
    select
        gglevel+1 as gglevel,
        gg.group_id as parentid,
        gg.child_id,
        convert(varchar(max), concat(cte.visited,',',gg.group_id,':',gg.child_id))
            as visited
    from cte
    join groups_groups gg
        on gg.group_id = cte.group_id
    where visited not like concat('%,',gg.group_id,':',gg.child_id, ',%')
)
select * from cte
order by group_id,gglevel
gglevel parentid group_id visited
0 0 70 ,0:70
0 0 78 ,0:78
1 80 79 ,0:80,80:79
2 80 79 ,0:78,78:80,80:79
0 0 80 ,0:80
1 78 80 ,0:78,78:80

fiddle

In the above, the visited column is there because in my full dataset, we have many loops and places where the parent equals the child and without, would have infinite loops.

This mostly works, except in the 4th row we see parent to group of 80-79 again. What I would hope to see is one row where 80 gives 79 (correct) and the level 2 row should be 78 gives 79 (it does because of 80 in the middle).

Is there a way to have levels 2 (and higher since we can have up to 10-20 levels) show all the possible combinations, all of the ways that I could be granted 79?

And is there a better way to prevent the infinite loop than this string concat/where clause?

Thank you!


Solution

  • A few thoughts:

    1. Since SQL Server doesn't have array types, the delimited string is a common way or tracking visited nodes in cases like this.
    2. For this to work, you also need to maintain a trailing comma in the visited list. On the other hand, the "parent_id:" portion can probably be dropped.
    3. To get the results you want showing start/end node IDs across multiple steps, your anchor query should record a starting ID, that is then copied unchanged down into each level of the recursive query.
    4. If you want to have both zero and non-zero nodes as your starting point, you may need to add the 0-node and 0-node-links to your source data, or find some other way to inject the 0-node cases into the logic.

    Applying these changes to your posted query, we get:

    with cte as (
        -- Anchor part
        select
            0 as gglevel,
            groups.group_id as starting_id, -- *** Changed ***
            groups.group_id,
            convert(varchar(max),
            concat(',' ,groups.group_id, ',')) as visited -- *** Changed ***
        from groups
        union all
        -- Recursive part
        select
            gglevel+1 as gglevel,
            cte.starting_id, -- *** Changed ***
            gg.child_id,
            convert(varchar(max), concat(cte.visited, gg.child_id, ',')) -- *** Changed ***
                as visited
        from cte
        join groups_groups gg
            on gg.group_id = cte.group_id
        where visited not like concat('%,', gg.child_id, ',%') -- *** Changed ***
    )
    select * from cte
    order by
        starting_id,
        gglevel,
        group_id;
    

    Results (with current sample data):

    gglevel starting_id group_id visited
    0 70 70 ,70,
    0 78 78 ,78,
    1 78 80 ,78,80,
    2 78 79 ,78,80,79,
    0 80 80 ,80,
    1 80 79 ,80,79,

    To include 0-node starting values, you can add the following data:

    insert into  groups_groups values (0,78);
    insert into  groups_groups values (0,80);
    insert into  groups_groups values (0,79);
    insert into groups values (0);
    

    Alternately, you could kluge-up the CTE with additional anchor queries to fill in the missing 0-node data.

    with cte as (
        -- Anchor part (1 of 3) - 0:0
        select
            0 as gglevel,
            0 as starting_id, -- *** Changed ***
            0 as group_id,
            convert(varchar(max), ',0,') as visited -- *** Changed ***
        union all
        -- Anchor part (2 of 3) - 0:group (not defined in 
        select
            1 as gglevel,
            0 as starting_id, -- *** Changed ***
            groups.group_id,
            convert(varchar(max), concat(',0,' ,groups.group_id, ',')) as visited -- *** Changed ***
        from groups
        -- Anchor part (3 of 3) - group:group
        union all
        select
            0 as gglevel,
            groups.group_id as starting_id, -- *** Changed ***
            groups.group_id,
            convert(varchar(max), concat(',' ,groups.group_id, ',')) as visited -- *** Changed ***
        from groups
        union all
        -- Recursive part
        select
            gglevel+1 as gglevel,
            cte.starting_id, -- *** Changed ***
            gg.child_id,
            convert(varchar(max), concat(cte.visited, gg.child_id, ',')) -- *** Changed ***
                as visited
        from cte
        join groups_groups gg
            on gg.group_id = cte.group_id
        where visited not like concat('%,', gg.child_id, ',%') -- *** Changed ***
    )
    select * from cte
    order by
        starting_id,
        gglevel,
        group_id;
    

    Results (with 0-node starting values):

    gglevel starting_id group_id visited
    0 0 0 ,0,
    1 0 78 ,0,78,
    1 0 79 ,0,79,
    1 0 80 ,0,80,
    2 0 79 ,0,80,79,
    2 0 80 ,0,78,80,
    3 0 79 ,0,78,80,79,
    0 70 70 ,70,
    0 78 78 ,78,
    1 78 80 ,78,80,
    2 78 79 ,78,80,79,
    0 80 80 ,80,
    1 80 79 ,80,79,

    The above reflects multiple paths to get from the 0-node to several other nodes. (This could also be the case with non-zero starting nodes with different data.) It also shows the start = end cases. You can add filtering logic (deduping, shortest path, etc.) to trim the results down to suit your preferences.

    See this db<>fiddle for a demo.