I have been browsing posts on this site, the library's documentation, and discussions and explanations on other sites. However, I am struggling to understand how the C++ Boost Graph library works in my application. My questions are highlighted in bold. I list them chronologically regarding what part of the function they refer to. The bottom question is the most essential one though.
Consider the following example using C++ 17:
#include <vector>
#include <boost/config.hpp>
#include <boost/graph/graph_traits.hpp>
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/dijkstra_shortest_paths.hpp>
#include <boost/property_map/property_map.hpp>
typedef boost::adjacency_list <boost::listS, boost::vecS, boost::directedS, boost::no_property, boost::property<boost::edge_weight_t, double> > graph_t;
typedef boost::graph_traits<graph_t>::vertex_descriptor vertex_descriptor;
void example_paths(std::vector<int>& from, std::vector<int>& to, std::vector<double>& weights, int source, std::vector<int>& targets) {
}
The input vectors from
, to
, and weights
have around 2.2 billion elements each and denote the edges of the graph. from
hosts the origin IDs, to
the destination IDs, and weights
the edge weights. I.e. the first edge goes from vertex from[0]
to vertex to[0]
and has weight weights[0]
. This data creates a graph with around 250 million vertices. Almost all vertices are connected to the main graph. Strictly speaking, it is thus a multi-graph, are however a graph made up of multiple graphs is called in this library. From that graph, I want to compute the shortest paths from the vertex with the ID source
to the vertices with the IDs targets
.
In my code, I follow the documentation's example and define the OutEdgeList
as a boost::listS
and the VertexList
as a boost::vecS
. I read that the types (boost::vecS
, boost::listS
, boost::slistS
, boost::setS
, boost::multisetS
, and boost::hash_setS
) available for OutEdgeList
and VertexList
differ in terms of computational efficiency. As I have to consider the total computational efficiency, taking both the graph construction from the input vectors and the application of Dijkstra's algorithm into account, I do not know which type I should choose. What types am I supposed to use for OutEdgeList
and VertexList
?
That brings me to my second question. I have found two drastically different methods of adjacency list construction:
graph_t g;
for(int i = 0; i < from.size(); i++) {
boost::add_edge (from[i], to[i], weights[i], g);
}
and
typedef std::pair< int, int > Edge;
const int num_nodes = 5;
enum nodes { A, B, C, D, E };
char name[] = "ABCDE";
Edge edge_array[] = { Edge(A, C), Edge(B, B), Edge(B, D), Edge(B, E),
Edge(C, B), Edge(C, D), Edge(D, E), Edge(E, A), Edge(E, B) };
int weights[] = { 1, 2, 1, 2, 7, 3, 1, 1, 1 };
int num_arcs = sizeof(edge_array) / sizeof(Edge);
graph_t g(edge_array, edge_array + num_arcs, weights, num_nodes);
where the second approach is taken from the documentation and not adjusted to my application. Which approach is more efficient? If it is the second one, what would be the translation for my example? And is it a problem that I use numeric instead of character IDs?
In general, I want to implement three variants of this shortest path function. The first variant only obtains the vertex IDs along the shortest paths. I.e. it generates a std::vector<std::vector<int> >
where the first element is a vector from source
to targets[0]
etc. The second variant only returns the distances, i.e. a std::vector<double>
where the first element is the sum of edge weights along the shortest path between source
and targets[0]
etc. The third variant returns the output of both the other variants. From what I understand, calculating the shortest paths works something like this
std::vector<vertex_descriptor> p(num_vertices(g));
std::vector<double> d(num_vertices(g));
vertex_descriptor s = vertex(1, g);
std::set<vertex_descriptor> t = {vertex(5, g), vertex(7, g)};
boost::dijkstra_shortest_paths(g, s, boost::predecessor_map(&p[0]).distance_map(&d[0]).visitor(target_visit(targets, on_examine_vertex())));
where I can add or leave out parts, depending on whether I want to extract the distances or the vertex IDs along the paths. Though, I do not really understand how it works. How do I call boost::dijkstra_shortest_paths
to obtain the std::vector
s in each of the three function variants described above?
I am sorry for this lengthy post. I suppose that these issues are intuitive for someone experienced with this library, but I have been stuck on them for days. I am aware that some people on this platform are sensitive to questions i.a. asking about efficiency, but dealing with billions of edges requires computational efficiency to be a primary consideration.
Please do not hesitate to ask questions in case parts of my post require clarification.
Thanks in advance.
How do I call boost::dijkstra_shortest_paths to obtain the std::vectors in each of the three function variants described above?
One call to boost::dijkstra_shortest_paths() gives you all the data that you need. To extract the data in the form you want requires working through the 'syntactical sugar' that encumbers boost::graph
Variant 1 : paths from source to all targets
This requires backtracking from target to source through the parent vector. Like this:
std::vector<std::vector<int>> Variant1(
graph_t &g,
const std::vector<int> &p)
{
std::vector<std::vector<int>> ret;
for (int target = 1; target < name.length(); target++)
{
std::vector<int> path;
int vertex = target;
while (vertex != 0)
{
path.push_back(vertex);
vertex = p[vertex];
}
std::reverse(path.begin(), path.end());
ret.push_back(path);
}
return ret;
}
Variant 2 : distance from source to all targets
The distance vector gives you this directly
Variant 3 : returns the output of both the other variants.
Just what it says.
You can do all three in one pass, something like this:
dijkstra_shortest_paths(
g, s,
predecessor_map(make_iterator_property_map(
p.begin(), get(vertex_index, g)))
.distance_map(make_iterator_property_map(
d.begin(), get(vertex_index, g))));
displayVariant1(Variant1(g, p));
displayVariant2(g, d);
displayVariant3(g, p, d);
which outputs:
Variant1:
Path from A to C: C D E B
Path from A to C: C
Path from A to C: C D
Path from A to C: C D E
Variant2:
distance from A to A = 0
distance from A to B = 6
distance from A to C = 1
distance from A to D = 4
distance from A to E = 5
Variant3:
Path from A to A: C D E B Total distance: 0
Path from A to B: C Total distance: 6
Path from A to C: C D Total distance: 1
Path from A to D: C D E Total distance: 4
Path from A to E: Total distance: 5
Note that if you only want one of variant 1 or 2 without 3 at all, you can omit the unneeded map. Like this:
dijkstra_shortest_paths(
g, s,
distance_map(make_iterator_property_map(
d.begin(), get(vertex_index, g))));
displayVariant2(g, d);
The complete code for the entire application is at https://gist.github.com/JamesBremner/30a8120ffac6ce884a5ee290eff9d08e
For my general comment on dealing with the baroque boost::graph docs see https://stackoverflow.com/a/1514439/16582
The above answers your question as posed. But, this is NOT what I would do for real work. So, here is some general, unsolicited advice.
Use bundled properties rather than property maps. This is much simpler. https://www.boost.org/doc/libs/1_82_0/libs/graph/doc/bundles.html
The boost::graph application programming interface ( API ) is both arcane and outdated. To simplify routine coding I would hide the the API in a thin wrapper class. As an example: https://github.com/JamesBremner/so76842374
The boost::graph API is the way it is because it is optimized to be generic, i.e. extremely flexible and applicable to almost any graph theory problem. It is NOT optimized for performance. To handle billions of edges, I think you will need something more streamlined. For example, PathFinder is able to find the shortest paths from a source to every other vertex in a graph with millions of edges in seconds. ( details at https://github.com/JamesBremner/PathFinder/wiki/Costs#performance )