I'm solving the problem described below and can't think of a better algorithm than trying every permutation of every vertex of every group with every.
I'm given a graph of vertices, along with a list of groups of specific vertices, the goal is to find the shortest path from a specific starting vertex to a specific ending vertex, and the path must pass through at least one vertex from each specified group of vertices. There are also vertices in the graph that are not part of any given group. Re-visiting vertices and edges is possible.
The graph data is specified as follows:
I would be grateful for any ideas for a better solution, thank you.
We can use bitmasks to efficiently check which groups we have visited so far, and combine this with a traditional BFS/ Dijkstra's shortest-path algorithm.
If we assume E
edges, V
vertices, and K
vertex-groups that have to be included, the below algorithm has a time complexity of O((V + E) * 2^K)
and a space complexity of O(V * 2^K)
. The exponential 2^K
term means it will only work for a relatively small K
, say up to 10 or 20.
First, are the edges weighted?
If yes then a "shortest path" algorithm will usually be a variation of Dijkstra's algorithm, in which we keep a (min) priority queue of the shortest paths. We only visit a node once it's at the top of the queue, meaning that this must be the shortest path to this node. Any other shorter path to this node would already have been added to the priority queue and would come before the current iteration. (Note: this doesn't work for negative paths).
If no, meaning all edges have the same weight, then there is no need to maintain a priority queue with the shortest edges. We can instead just run a regular Breadth-first search (BFS), in which we maintain a deque with all nodes at the current depth. At each step we iterate over all nodes at the current depth (popping them from the left of the deque), and for each node we add all it's not-yet-visited neighbors to the right side of the deque, forming the next level.
The below algorithm works for both BFS and Dijkstra's, but for simplicity's sake for the rest of the answer I'll pretend that the edges have positive weights and we will use Dijkstra's. What is important to take away though is that for either algorithm we will only "visit" or "explore" a node for a path that must be the shortest path to that node. This property is essential for the algorithm to be efficient, since we know that we will at most visit each of the V
nodes and E
edges only one time, giving us a time complexity of O(V + E)
. If we use Dijkstra's we have to multiply this with log(V)
for the priority queue usage (this also applies to the time complexity mentioned in the summary).
In our case we have the additional complexity that we have K
vertex-groups, for each of which our shortest path has to contain at least one the nodes in it. This is a big problem, since it destroys our ability to simple go along with the "shortest current path".
See for example this simple graph. Notation: --
means an edge, start
is that start node, and end
is the end node. A vertex with value 0
does not have a vertex-group, and a vertex with value >= 1
belongs to the vertex-group of that index.
end -- 0 -- 2 -- start -- 1 -- 2
It is clear that the optimal path will first move right to the node in group 1
, and then move left until the end. But this is impossible to do for the BFS and Dijkstra's algorithm we introduced above! After we move from the start to the right to capture the node in group 1
, we would never ever move back left to the start, since we have already been there with a shorter path.
In the above example, if the right-hand side would have looked like start -- 0 -- 0
, where 0
means the vertex does not not belonging to a group, then it would be of no use to go there and back to the start.
The decisive reason of why it makes sense to go there and come back, although the path will get longer, is that it includes a group that we have not seen before.
How can we keep track of whether or not at a current position a group is included or not? The most efficient solution is a bit mask. So if we for example have already visited a node of group 2
and 4
, then the bitmask would have a bit set at the position 2
and 4
, and it would have the value of 2 ^ 2 + 2 ^ 4 == 4 + 16 == 20
In the regular Dijkstra's we would just keep a one-dimensional array of size V
to keep track of what the shortest path to each vertex is, initialized to a very high MAX
value. array[start]
begins with value 0
.
We can modify this method to instead have a two-dimensional array of dimensions [2 ^ K][V]
, where K
is the number of groups. Every value is initialized to MAX
, only array[mask_value_of_start][start]
begins with 0
.
The value we store at array[mask][node]
means Given the already visited groups with bit-mask value of mask
, what is the length of the shortest path to reach this node
?
Once we have this structure, we can suddenly use Dijkstra's again (it's the same for BFS). We simply change the rules a bit:
--> in our modification we differentiate by mask
and never re-visit a node if it's already been visited for that particular mask
.
--> in our modification we look at all neighbors, and update the mask we use to check for this neighbor like: neighbor_mask = mask | (1 << neighbor_group_id)
. We only add a {neighbor_mask, neighbor}
pair to the priority queue, if for that particular array[neighbor_mask][neighbor]
we managed to decrease the minimal path length.
--> In our modification we only visit nodes that for their respective mask
values are not explored yet. We also only visit the current shortest path among all masks, meaning that for any given mask
it must be the shortest path.
end
node, since we are sure we got the shortest path to it.--> In our modification we can return once we visit the end
node for the full mask
, meaning the mask containing all groups, since it must be the shortest path for the full mask. This is the answer to our problem.
That's it! Because time and space complexity are exponentially dependent on the number of groups K
, this will only work for very small K
(of course depending on the number of nodes and edges).
If this is too slow for your requirements then there might be a more sophisticated algorithm for this that someone smarter can come up with, it will probably involve dynamic programming.
It is very possible that this is still too slow, in which case you will probably want to switch to some heuristic, that sacrifices accuracy in order to gain more speed.