I am trying to use DFS and BFS to find all simple paths that have lengths up to a given k, starting at a given vertex in a directed graph. No cycles are allowed.
My code is as follows, and I have verified that both of the algorithms produce correct results.
I define the graph class and put the code of BFS and DFS in this class. There are about 190,000 vertices and 2,500,000 edges in my needed graph. The average out-degree of each vertex is 105:
public class AdjGraph {
private int V; // sum of vertices
private int E; // sum of edges
private List<Vertex> vertexList;
private Map<Vertex, List<Edge>> vertexAdj;
public AdjGraph() {
this.V = 0;
this.E = 0;
this.vertexList = new ArrayList<>();
this.vertexAdj = new HashMap<>();
}
public void addVertex(Vertex v) {
this.vertexList.add(v);
this.vertexAdj.put(v, new ArrayList<>());
this.V++;
}
public void addEdge(Edge e) {
Vertex startVertex = e.getStartVertex();
List<Edge> edgeList = this.vertexAdj.get(startVertex);
if (!edgeList.contains(e)) {
edgeList.add(e);
this.vertexAdj.put(startVertex, edgeList);
}
}
public List<Vertex> getVertexList() {
return vertexList;
}
// Version of DFS
public Map<Integer, List<List<Vertex>>> findAllPathsUpToLengthByDFS(Vertex start, int k) {
boolean[] visited = new boolean[vertexList.size()];
List<Vertex> path = new ArrayList<>();
Map<Integer, List<List<Vertex>>> allPaths = new HashMap<>();
findAllPathsUpToLengthByDFSUtil(start, k, visited, path, allPaths);
return allPaths;
}
// Used by the function findAllPathsUpToLengthByDFS(...)
private void findAllPathsUpToLengthByDFSUtil(Vertex u, int k, boolean[] visited, List<Vertex> path, Map<Integer, List<List<Vertex>>> allPaths) {
// Mark the current node as visited and store in path
visited[vertexList.indexOf(u)] = true;
path.add(u);
// If current path length exceeds k, backtrack
if (path.size() - 1 > k) {
visited[vertexList.indexOf(u)] = false;
path.remove(path.size() - 1);
return;
}
// If current path length is within k, add it to allPaths
int pathLength = path.size() - 1;
if (pathLength <= k) {
allPaths.computeIfAbsent(pathLength, x -> new ArrayList<>()).add(new ArrayList<>(path));
}
// Recur for all the vertices adjacent to this vertex
for (Edge edge : vertexAdj.get(u)) {
Vertex v = edge.getEndVertex();
if (!visited[vertexList.indexOf(v)]) {
findAllPathsUpToLengthByDFSUtil(v, k, visited, path, allPaths);
}
}
// Remove current vertex from path and mark it as unvisited
path.remove(path.size() - 1);
visited[vertexList.indexOf(u)] = false;
}
// Version of BFS
public Map<Integer, List<List<Vertex>>> findAllPathsUpToLengthByBFS(Vertex start, int k) {
Map<Integer, List<List<Vertex>>> allPaths = new HashMap<>();
Queue<List<Vertex>> queue = new LinkedList<>();
// Initialize the queue with the start vertex
List<Vertex> startPath = new ArrayList<>();
startPath.add(start);
queue.add(startPath);
while (!queue.isEmpty()) {
List<Vertex> path = queue.poll();
Vertex last = path.get(path.size() - 1);
// Add the path to allPaths if its length is within k
int pathLength = path.size() - 1;
if (pathLength <= k) {
allPaths.computeIfAbsent(pathLength, x -> new ArrayList<>()).add(new ArrayList<>(path));
}
// If the path length is already k, no need to explore further
if (pathLength < k) {
// Explore the neighbors of the last vertex in the path
for (Edge edge : vertexAdj.get(last)) {
Vertex next = edge.getEndVertex();
if (!path.contains(next)) { // Avoid cycles
List<Vertex> newPath = new ArrayList<>(path);
newPath.add(next);
queue.add(newPath);
}
}
}
}
return allPaths;
}
}
Vertex class:
public class Vertex {
// The origin names are not id1 and id2.
// I change them for simplicity.
private String id1;
private String id2;
public Vertex() {
}
public Vertex(String id1, String id2) {
this.id1 = id1;
this.id2 = id2;
}
// Getters and Setters
@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null || getClass() != obj.getClass()) {
return false;
}
Vertex other = (Vertex) obj;
return id1.equals(other.id1) && id2.equals(other.id2);
}
@Override
public int hashCode() {
return Objects.hash(id1, id2);
}
}
Edge class:
public class Edge {
private Vertex startVertex;
private Vertex endVertex;
public Edge(Vertex startVertex, Vertex endVertex) {
this.startVertex = startVertex;
this.endVertex = endVertex;
}
// Getters for startVertex and endVertex
public Vertex getStartVertex() {
return startVertex;
}
public Vertex getEndVertex() {
return endVertex;
}
@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null || getClass() != obj.getClass()) {
return false;
}
Edge other = (Edge) obj;
return Objects.equals(startVertex, other.startVertex) &&
Objects.equals(endVertex, other.endVertex);
}
@Override
public int hashCode() {
return Objects.hash(startVertex, endVertex);
}
}
Here is an example. The data volume of this example is relatively small, so there is not obvious running time difference. When the graph is big, the difference is obvious. In my graph, BFS runs 2ms while DFS runs 15s when k = 1:
public static void main(String[] args) {
AdjGraph graph = new AdjGraph();
for (int i = 0; i < 1000; i++) {
Vertex v = new Vertex(String.valueOf(i), String.valueOf(i));
graph.addVertex(v);
}
List<Vertex> vertexList = graph.getVertexList();
for (int i = 0; i < vertexList.size(); i++) {
for (int j = 0; j < vertexList.size(); j++) {
if (j != i) {
graph.addEdge(new Edge(vertexList.get(i), vertexList.get(j)));
}
}
}
Vertex start = vertexList.get(0); // Start node
int k = 1; // Maximum path length
Long t1 = System.currentTimeMillis();
Map<Integer, List<List<Vertex>>> allPaths = graph.findAllPathsUpToLengthByBFS(start, k);
// Map<Integer, List<List<Vertex>>> allPaths = graph.findAllPathsUpToLengthByBFS(start, k);
Long t2 = System.currentTimeMillis();
System.out.println("Running time: " + (t2 - t1) + "ms");
}
Surprisingly, BFS is much quicker than DFS. Does anybody know why? Maybe it is because there are too many duplicate calculation in DFS? How can I optimize them?
Thanks!
The main reason is that in your DFS implementation, you still look for neighbors when you have already reached the maximum path length. In other words, your base case exit is one level deeper than you could have done, which means a lot of iterations are made without any possibility to add a new path.
This issue can be fixed by changing this part:
for (Edge edge : vertexAdj.get(u)) {
Vertex v = edge.getEndVertex();
if (!visited[vertexList.indexOf(v)]) {
findAllPathsUpToLengthByDFSUtil(v, k, visited, path, allPaths);
}
}
to this:
if (pathLength < k) { // *********
for (Edge edge : vertexAdj.get(u)) {
Vertex v = edge.getEndVertex();
if (!visited[vertexList.indexOf(v)]) {
findAllPathsUpToLengthByDFSUtil(v, k, visited, path, allPaths);
}
}
}
Note how this is exactly the pattern you used in your BFS implementation. And with this change you can remove the other base case handling you had:
// If current path length exceeds k, backtrack
if (path.size() - 1 > k) {
visited[vertexList.indexOf(u)] = false;
path.remove(path.size() - 1);
return;
}
There are however more things you can do to improve the DFS implementation, although these will have less dramatic effects:
visited[vertexList.indexOf(v)]
is a relatively slow operation when there are many vertices. It would be more efficient to use a hash set instead of an array.path
an ordered hash set (LinkedHashSet
), so that you can drop visited
and use path.contains
for that purpose.Here is a version of the DFS implementation that implements those changes. I have named the functions with a "2" at the end so you can include them in your project and compare the performance with your original functions:
// Improved Version of DFS (DFS2)
public Map<Integer, List<List<Vertex>>> findAllPathsUpToLengthByDFS2(Vertex start, int k) {
// Use an insertion-ordered set so you don't need a separate visited array
Set<Vertex> path = new LinkedHashSet<>();
Map<Integer, List<List<Vertex>>> allPaths = new HashMap<>();
findAllPathsUpToLengthByDFSUtil2(start, k, path, allPaths);
return allPaths;
}
private void findAllPathsUpToLengthByDFSUtil2(Vertex u, int k, Set<Vertex> path, Map<Integer, List<List<Vertex>>> allPaths) {
path.add(u);
int pathLength = path.size() - 1;
allPaths.computeIfAbsent(pathLength, x -> new ArrayList<>()).add(new ArrayList<>(path));
if (pathLength < k) { // This is where you should stop recursion
for (Edge edge : vertexAdj.get(u)) {
Vertex v = edge.getEndVertex();
if (!path.contains(v)) {
findAllPathsUpToLengthByDFSUtil2(v, k, path, allPaths);
}
}
}
path.remove(u);
}
Although DFS has an overhead of the call stack, in terms of memory the advantages of DFS over BFS start to weigh in when k
gets greater. For instance, I tried your main
code, but with a graph of just 25 nodes (instead of 1000), but with k
equal to 5. Then we see that BFS turns out to be slower than DFS. This is mainly due to the many partial paths that it stores in memory, while DFS -- by nature -- only uses one partial path.