I am studying Kruskal's MST from this geeksforgeeks article. The steps given are:
Sort all the edges in non-decreasing order of their weight.
Pick the smallest edge. Check if it forms a cycle with the spanning tree formed so far. If cycle is not formed, include this edge. Else, discard it.
Repeat step (2) until there are (V-1) edges in the spanning tree.
I really don't feel any need to use disjoint set. Instead for checking a cycle we can just store vertices in a visited array and mark them as true whenever an edge is selected. Looping through the program if we find an edge whose both vertices are in the visited array we ignore that edge.
In other words, instead of storing a disjoint-set forest, can’t we just store an array of bits indicating which vertices have been linked to another edge in some previous step?
The approach you’re describing will not work properly in all cases. As an example, consider this line graph:
A - - B - - C - - D
Let’s assume A-B has weight 1, C-D has weight 2, and B - C has weight 3. What will Kruskal’s algorithm do here? First, it’ll add in A - B, then C - D, and then B - C.
Now imagine what your implementation will do. When we add A - B, you’ll mark A and B as having been visited. When we then add C - D, you’ll mark C and D as having been visited. But then when we try to add B - C, since both B and C are visited, you’ll decide not to add the edge, leaving a result that isn’t connected.
The issue here is that when building up an MST you may add edges linking nodes that have already been linked to other nodes in the past. The criterion for adding an edge is therefore less “have these nodes been linked before?” and more “is there already a path between these nodes?” That’s where the disjoint-set forest comes in.
It’s great that you’re poking and prodding conventional implementations and trying to find ways to improve them. You’ll learn a lot about those algorithms if you do! In this case, it just so happens that what you’re proposing doesn’t quite work, and seeing why it doesn’t work helps shed light on why the existing approach is what it is.