data-structures minimum-spanning-tree kruskals-algorithm

Kruskal's Algorithm Minimum Spanning Tree (Disjoint Set data structure)

I would like to generate a minimum spanning tree for the following graph:

I am using the disjoint set data structure and I am having trouble with the union (Union Rank) operation.

The edges get sorted as show below:

(1, 6) - 5;
(3, 4) - 12;
(2, 7) - 14;
(2, 3) - 16;
(4, 7) - 18;
(4, 5) - 22;
(5, 6) - 25;
(5, 7) - 26;
(1, 2) - 28;

I am getting a totally irrelevant result when compared to the correct MST and correct total minimum cost of 94.

How should can a minimum spanning tree be generated from a disjoint set (union rank) and the cost be calculated?

Note that the correct minimum spanning tree is:

The tree obtained from the merging operations is shown in the image below.

Tree obtained from the merging operations

A connection is indicated by <- and the sequence of operations is:

1 <- 6; 3 <- 4; 2 <- 7; 3 <- (2 <- 7); 4 <-7 gets excluded as it forms a cycle; 3 <- 5 (4 <- 5 occurs and 5 gets connected to 3 as the 4 is the ultimate ancestor of 3; 5 <- 6 (3 <- (1 <- 6) as the ultimate ancestor of 6 is 1 and the ultimate ancestor of 5 is 3. 3 has a higher rank and 1 <- 6 gets added to the tree in which 3 is the ultimate ancestor.

Solution

The algorithm you mention in the title is quite straightforward, so if you get a wrong result, you have not implemented it correctly.

The Wikipedia page on Kruskal's algorithm explains it and provides pseudo code. The important steps of the algorithm are:

Initially consider each node to be in its own set
Visit the edges in order of weight, and for each do:
- If the two end points of the visited edge do not belong to the same set, then:
  - Consider the edge to be on the MST
  - Merge the two sets (using Union-Find)

Here is an implementation in Python:

from collections import defaultdict

class DisjointSetNode:  # Generic, rank-based Union-Find implementation 
    def __init__(self):
        self.parent = self
        self.rank = 0

    def find(self):
        if self != self.parent:
            self.parent = self.parent.find()
        return self.parent

    def union(self, other):
        a = self.find()
        b = other.find()
        if a != b:
            if a.rank < b.rank:
                a, b = b, a
            b.parent = a
            if a.rank == b.rank:
                a.rank += 1

def mst(edges):
    nodes = defaultdict(DisjointSetNode)

    for edge in sorted(edges, key=lambda x: x[2]): # Visit edges in order of weight
        # If a nodes entry is not found, that entry is created 
        #    with a new DisjointSetNode instance on-the-fly
        a = nodes[edge[0]]
        b = nodes[edge[1]]
        if a.find() != b.find():
            yield edge  # This edge is on our MST
            a.union(b)

You can run it as follows on the example graph you have given:

edges = [
    (1, 6, 5),   # node1, node2, weight
    (3, 4, 12),
    (2, 7, 14),
    (2, 3, 16),
    (4, 7, 18),
    (4, 5, 22),
    (5, 6, 25),
    (5, 7, 26),
    (1, 2, 28)
]
mst_edges = list(mst(edges))
print("MST:", mst_edges)
print("Total weight of MST:", sum(edge[2] for edge in mst_edges))

This outputs:

MST: [(1, 6, 5), (3, 4, 12), (2, 7, 14), (2, 3, 16), (4, 5, 22), (5, 6, 25)]
Total weight of MST: 94

Stepping through the Algorithm

The input example graph looks like this:

We start out with just the nodes of the graph -- to which we want to add MST edges, and a disjoint set data structure where each node sits in its own set. I have pictured the disjoint set data structure at the right, using blue colors, and with the ranks displayed in small fonts, starting out all as zero:

The edges are visited in order of weight so the first one is the edge between nodes 1 and 6. That means we add this edge to the MST and merge the two sets into one:

Similarly the next two edges -- between 3 and 4, and between 2 and 7 -- are added. Nothing unusual happens:

Then the edge between 2 and 3 is added. When merging the disjoint sets for these two nodes, the root of the one becomes the child of the other root, and the rank is updated, giving a set that contains nodes 2, 3, 4 and 7:

The next edge that is visited is between nodes 4 and 7, but the disjoint set structure shows that node 4 has a root that is the same as the one for node 7: that root is 2. So this edge is discarded: it is not part of the MST as it would create a cycle.

Then comes the edge between nodes 4 and 5. The disjoint-set-root for node 4 is 2, while node 5 is still in its own set. The merge makes node 5 a child of 2. This time there is no need to update the rank:

The next edge is between nodes 5 and 6. The disjoint-set-roots of these two are 1 and 2. So those two get linked up in the disjoint set structure while the edge is added to the MST:

The MST is now complete as the disjoint set structure has just one overall set (which is rooted by 2). The three remaining edges to visit will thus concern nodes that already belong to the same set, and they will therefore be skipped.

Hope this clarifies it.