python depth-first-search breadth-first-search

How can I efficiently parallelize and optimize a large-scale graph traversal algorithm in Python to handle millions of nodes?

I'm working on Python project that involves processing a very large graph - it has millions of nodes and edges. The goal is to perform a breadth-first search (BFS) or depth-first search (DFS) from a given start node to compute shortest paths or reachability.

Here's the challenge :

The graph is too large to fit comfortably into memory if stored natively.
The traversal needs to be fast, ideally leveraging multiple CPU cores.
I want to avoid race conditions and ensure thread-safe updates to shared data structures.
The graph data is stored as adjacency lists in files, and I want to process it efficiently without loading the entire graph at once.

Currently, I have a basic BFS implementation using a Python dictionary for adjacency, but it runs slowly and hits memory limits:

from collections import deque

def bfs(graph, start):
    visited = set()
    queue = deque([start])
    while queue:
        node = queue.popleft()
        if node not in visited:
            visited.add(node)
            queue.extend(graph.get(node, []))
    return visited

Question

How can I parallelize BFS/DFS in Python to speed up traversal? Should I use multiprocessing, concurrent.features, or another approach?

I'm open to suggestions involving alternative algorithms (like bidirectional BFS), external databases, or memory-mapped files.

I’ve tried the following:

A basic BFS using Python’s built-in set, deque, and dict, as shown in the code.
Storing the entire adjacency list in memory using a dictionary of lists, but this doesn’t scale well for millions of nodes.
I attempted to use the multiprocessing module to run multiple BFS instances in parallel from different starting points, but coordinating shared state and combining results without race conditions became complex.
Looked into NetworkX, but found it quite slow and memory-heavy for very large graphs.
I also tried reading adjacency data from a file line-by-line and processing it lazily, but traversal logic became messy and error-prone.

I expected to be able to:

Traverse the graph quickly (within a few seconds) even with millions of nodes.
Use multicore processing to speed up traversal.
Efficiently manage memory without crashing due to RAM limitations.
Ideally, stream data or keep only parts of the graph in memory at any given time.
I know Python isn’t the fastest language for this, but I’d like to push it as far as reasonably possible before considering rewriting in C++ or Rust.

Solution

You can use combination of "pyspark + graphframes" to achieve this.

Sample Code

from pyspark.sql import SparkSession
from graphframes import GraphFrame

# Create Spark session
spark = SparkSession.builder \
    .appName("BFS") \
    .config("spark.jars.packages", "graphframes:graphframes:0.8.2-spark3.0-s_2.12") \
    .getOrCreate()

# Define vertices and edges as DataFrames
vertices = spark.createDataFrame([
    ("a", "Alice"),
    ("b", "Bob"),
    ("c", "Charlie"),
    ("d", "David"),
    ("e", "Esther")
], ["id", "name"])

edges = spark.createDataFrame([
    ("a", "b"),
    ("b", "c"),
    ("c", "d"),
    ("d", "e")
], ["src", "dst"])

# Create GraphFrame
g = GraphFrame(vertices, edges)

# Run BFS from "a" to "e"
results = g.bfs(fromExpr="id = 'a'", toExpr="id = 'e'")
results.show(truncate=False)