arraysalgorithmsortingmultidimensional-arrayhierarchical-data

How to identify orphaned nodes


I have a hierarchy of nodes stored in DB. I select all, store them in an array, then iterate over them and create a nested array in memory.

The input looks like this:

[{name: A}, {name: B}, {name: X, parent: A}, {name: Y, parent: A}, {name: C}]

The output looks like this:

[{name: A, children:[{name: X}, {name: Y}]}, {B}, {C}]

There is no limit on how deep the nesting can go.

The problem I have is that if one of the records has an invalid parent reference, it cannot be put in the hierarchy and the script ends in an infinite loop, trying to find the parent.

I bet there's a way to tell when I've fallen into the infinite loop. For the record, when in the loop I realize there's no parent to insert the item into, I push the item at the end of the array because the parent might exists down the line.

I suppose I should be able to realize that I'm cycling the same items over and over again?

Edit 1 - the code This is the important bit:

    $cnt = count($array);
    do {
        $item = array_shift($array);
        if ($this->push($item)) {
            $cnt--;
        } else {
            array_push($array, $item);
        }
    } while ($cnt > 0);

($this->push() is a method that tries to find a parent and, if it succeeds, inserts $item into its hierarchy)


Solution

  • Looks like you are using a queue (remove from the front, add to the back) kind of data structure to store unprocessed nodes. As nodes are inserted into your output data structure they are dropped from the queue. If a node cannot be added to the output (because its 'parent' has not been moved to the output data structure yet) it is re-queued. Eventually the queue should become empty unless there are nodes where the 'parent' does not exist (orphans).

    I imagine your algorithim looks something like

     Do While not QueueEmpty()
        node = Dequeue() ' Remove from the front
        if not AddNodeToTree(node) then Queue(node) 'add to the back
     end
    

    Where AddNodeToTree is a function that takes a node, successfully adds it to the output and returns True. Otherwise it returns False causing the node to recycle.

    The only thing you should have to do is add a sentinal node to the back of the queue and a flag to indicate that at least one node was consumed from the queue during one complete cycle through it. The above algorithm becomes:

    set NodeProcessed to False
    Queue(SentinalNode) ' marker to identify cycle through queue
    Do while not QueueEmpty()
      node = Dequeue()
      if isSentinalNode(node) then
         if NodeProcessed then 
            Queue(node)
            set NodeProcessed to False
         else
            ' Queue contains only orphans or is empty
         end
      end
      if AddNodeToTree(node) then
         set NodeProcessed to True
      else
         Queue(node)
      end
    end
    

    The SentinalNode is a marker that you use to detect looping through the queue.

    Your output data structure looks like it contains a 'forest' of trees. That is, it contains several distinct trees. If there is any possibility that a given node may be shared among two or more trees, the above algorithm will not work properly. If a node may appear in at most one tree in the 'forest' then the above should be fine.