graph-theorydisjoint-setsdisjoint-union

Tracking the number of connected components when adding and removing edges


I'm working with an undirected graph where I need to frequently add and remove edges. I also need to efficiently determine the number of connected components after each operation.

For adding edges, I've found that using a Disjoint Set Union data structure provides a very efficient way to merge components and track the number of sets using the union operation.

However, removing an edge with a standard DSU seems to be less efficient. If an edge removal disconnects a component, the DSU doesn't inherently provide a way to easily split the sets. The typical approach seems to involve re-initializing the whole Disjoint Set Union, which is slow for a large number of edges.

Just to be clear I'll provide an example

Consider a graph with vertices V = {1, 2, 3, 4, 5} and initial edges E = {(1, 2), (2, 3), (4, 5)}. The connected components are {1, 2, 3} and {4, 5}.

Adding an edge (3, 4) results in new edges E' = {(1, 2), (2, 3), (4, 5), (3, 4)} and a single connected component {1, 2, 3, 4, 5}. Disjoint Set Union efficiently handles this.

Now, removing edge (2, 3) leaves edges E'' = {(1, 2), (4, 5), (3, 4)} and new connected components {1, 2} and {3, 4, 5}. The challenge is that using only a basic DSU I need to go though the whole graph to determine the connected components.


Solution

  • If you would like to efficiently dissect components upon removing edges (when necessary), just to decide whether the component falls into two parts, you'd have to search for connections not just locally, but across the component (to see if it is still one connected component), which would hint that a 1D data structure isn't sufficient to quickly gather this information, rather using a graph data structure or equivalent representations (e.g. adjacency matrix).

    I can suggest a direction you can take that may help:

    Adjacency matrix, ordered rows and columns: If you take the adjacency matrix and order the rows and columns such that nodes of a component continuously make up a section (i.e., sort the rows and columns e.g. by index of the component), then the matrix will consist of blocks (along the diagonal) representing the components, with 0 entries anywhere outside these blocks. There cannot be any connections between the components, which is why there are only 0 entries outside the blocks.
    Two diagonal blocks in an adjacency matrix, representing two disjoint components

    You may make use of this in multiple ways, I've seen sorting based on BFS be useful. Probably there is a fast solution in this direction.

    For example, after removing an edge, you may run BFS on only the nodes in the component that had an edge removed; and see if there are two blocks along the diagonal that have only 0 entries "across them". Here, you can apply just a simple sum operation to check if the component split to two (the sum of cells "across" the two blocks is zero).
    It may speed up the block searching that the edge removed has to have one node in one block and the other node in the other block, if there are 2 disjoint blocks after removing the edge.
    You may also make use of conditions when it's easy to tell that the component will not split into two, and skip the calculations.

    I'm thinking of some other approach where you store some other information between pairs of nodes (2D matrix) that can be used analogous to a look-up table to check if two nodes become disconnected if the edge between them was removed, updating this information quickly using paths, but so far I didn't come up with anything interesting.