python-3.xnetworkxgraph-theory# Fast way to find all connected subgraphs of given size in Python?

*[Note: A fast solution was given in an answer however further improvement concerning speed is desirable.]*

Given is an undirected sparsely connected graph *G* with *n* vertices. I am looking for a fast way to find all connected subgraphs of *G* with *m* vertices. It can be assumed that *m* ≪ *n* and that the degree of the vertices of *G* is *deg(v) ≪n*.

**Graphical example**

For the example we have *n*=5, *m*=3.

The 4 connected subgraphs are: (0,1,2),(1,2,3),(0,2,3),(2,3,4)

**Code example**

Following code using `networkx`

and `ego_graph`

is quite slow. It takes about 100 seconds for a graph with *n*=100, *deg(v)*=10, *m*=4. It is an example for *m* ≪ *n* and *deg(v) ≪n*. The case

```
import networkx as nx
import itertools
import time
n=100 #nodes in graph
deg=10 #node degree in graph
m=4 #nodes in subgraph
graph=nx.gnm_random_graph(n,n*deg,seed=1)#construct random graph
starttime=time.time()
G = graph.copy()
all_connected_subgraphs = []
radius=m-1
for n in graph.nodes():
egoG = nx.ego_graph(G,n,radius=radius,center=False)# all closeby nodes
for sn in itertools.combinations(egoG, radius):# test all combinations of closeby nodes
SG=[n,*sn]
G_sub=G.subgraph(SG)
if nx.is_connected(G_sub):
all_connected_subgraphs.append(SG)
G.remove_node(n)
endtime=time.time()
print(endtime-starttime)
```

The slow steps are the lines `G_sub=G.subgraph(SG)`

and `if nx.is_connected(G_sub):`

that consume 20% and 80% of total calculation time, respectively.

In a related question no efficient solution was given.

Solution

Here's a solution using a recursive generator function: I didn't use NetworkX, so instead the graph is represented as a list of sets of neighbours.

```
def all_connected_subgraphs(g, m):
def _recurse(t, possible, excluded):
if len(t) == m:
yield t
else:
excluded = set(excluded)
for i in possible:
if i not in excluded:
new_t = (*t, i)
new_possible = possible | g[i]
excluded.add(i)
yield from _recurse(new_t, new_possible, excluded)
excluded = set()
for (i, possible) in enumerate(g):
excluded.add(i)
yield from _recurse((i,), possible, excluded)
```

It's a backtracking algorithm which says, given a partial subgraph of size `< m`

and a set of nodes that are allowed to be added, add each node and expand the set of possible options to include all nodes reachable from that node. To reduce symmetry, we keep track of which nodes have already been used and add these to the `excluded`

set.

Demonstration using your example graph:

```
>>> example_graph = [{1, 2}, {0, 2}, {0, 1, 3}, {2, 4}, {3}]
>>> list(all_connected_subgraphs(example_graph, 3))
[(0, 1, 2), (0, 2, 3), (1, 2, 3), (2, 3, 4)]
```

Here's the code I used to generate a random graph, which should be equivalent to the G(n, m) distribution you're sampling from:

```
import random
import itertools as it
def random_graph(n, num_edges):
adj_sets = [set() for i in range(n)]
all_edges = list(it.combinations(range(n), 2))
random.shuffle(all_edges)
for (i, j) in all_edges[:num_edges]:
adj_sets[i].add(j)
adj_sets[j].add(i)
return adj_sets
```

The running time on my laptop is under half a second to find all connected subgraphs of size 4 from a graph with 100 nodes and 1,000 edges, like your benchmark:

```
>>> g = random_graph(100, 1000)
>>> import timeit
>>> timeit.timeit(lambda: list(all_connected_subgraphs(g, 4)), number=1)
0.42606337900360813
```

Note that `list`

is needed here to exhaust the generator, otherwise the algorithm won't actually generate any subgraphs. If you just want to iterate over all connected subgraphs with a `for`

loop then you don't need to put them in a list first.

There are various "easy" ways to optimise this, e.g. the set of edges for each node `i`

can be filtered first to remove "back-edges" to nodes less than `i`

, and the recursion could be replaced with a stack to make an iterative algorithm that doesn't use `yield`

(which can have a relatively large performance overhead). But those optimisations are probably not needed unless your inputs need to be considerably bigger or denser.

- Collatz's hypothesis || want to reuse code instead of repeating
- why my directory copy endless when run copytree command of shutil?
- How can I reconnect to the browser opened by webdriver with selenium?
- Create and use WAV file as an object Python
- How to create a rank or an index column based on more than one column?
- Polars pairwise sum of array column
- Pandas not recognising older dates (pre-1600)
- How to make sense of this data extraction and convert it to pandas dataframe?
- How does Python's unittest library match patterns passed via the -p parameter?
- Python github composite action returns name not defined error
- How to run pytest-cov with pytest in VS Code?
- How do I append a string to a Path?
- What does a bare asterisk do in a parameter list? What are "keyword-only" parameters?
- Amazon textextract I can't find trp module
- Pydantic TypeError: validate() takes 2 positional arguments but 3 were given
- Attempted relative import with no known parent package
- How To Subscribe To Websocket API Channel Using Python?
- rdkit.Chem.rdForceFieldHelpers.UFFOptimizeMolecule(int) did not match C++ signature
- Connecting to SQL Server via SQL magics in Jupyter Lab
- Using httplib2 in python 3 properly? (Timeout problems)
- Incorrect calculation in the list processing logic based on dependencies between the elements of these lists
- Possible import error for pandas and matplotlib when calling Python3 via PHP shell_exec
- Attributes on Python flag enums
- Upload content from GET request to Cloud Storage
- mpmath : cannot create mpf
- DLL load failed when importing PyQt5
- How can I replace a substring in a Python pathlib.Path?
- ModuleNotFoundError: No module named 'polls'
- pyvenv-3.4 returned non-zero exit status 1
- stripe.error.SignatureVerificationError: No signatures found matching the expected signature for payload" - Python FastAPI