I have a very large graph dot file and want to extract a subgraph from a given vertex. This subgraph should contain the given vertex and all vertexes below it.
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/depth_first_search.hpp>
#include <boost/graph/graphviz.hpp>
#include <iostream>
struct Vertex {
std::string node;
};
using Graph = boost::adjacency_list<boost::vecS, boost::vecS, boost::directedS, Vertex>;
template <typename Fn>
requires std::is_invocable_r_v<bool, Fn, Graph::vertex_descriptor>
void removeVertexIf(Graph& graph, Fn const& fn) {
Graph::vertex_descriptor count = num_vertices(graph);
Graph::vertex_descriptor i = 0;
while (i < count) {
if (fn(i)) {
clear_vertex(i, graph);
remove_vertex(i, graph);
--count;
}
else {
++i;
}
}
}
struct DFSVisitor : boost::default_dfs_visitor {
DFSVisitor(std::vector<bool>& reachable)
: reachable(reachable)
{}
void discover_vertex(Graph::vertex_descriptor const index, Graph const&) {
reachable[index] = true;
}
std::vector<bool>& reachable;
};
void removeUnreachable(Graph& graph, Graph::vertex_descriptor const start_index) {
std::vector<bool> reachable(num_vertices(graph), false);
DFSVisitor visitor(reachable);
depth_first_search(graph, boost::visitor(visitor).root_vertex(start_index));
removeVertexIf(graph, [&](Graph::vertex_descriptor const index) {
return reachable[index];
});
}
int main() {
std::istringstream input(
"digraph{"
"0;1;2;3;4;5;6;7;8;9;"
"0->1;1->2;2->3;2->6;3->4;4->5;5->8;6->7;6->5;7->8;8->9;"
"}");
Graph g(0);
boost::dynamic_properties dp(boost::ignore_other_properties);
dp.property("node_id", get(&Vertex::node, g));
boost::read_graphviz(input, g, dp);
// delete everything that is not below Node 6
removeUnreachable(g, 6);
boost::write_graphviz(std::cout, g);
}
In this minimal example, the given vertex has the node ID 6. The following graphic shows what I want to extract:
How can I remove the nodes and edges that are not below Node 6? My current removeUnreachable
iterates over the entire graph instead of starting at start_index
by depth_first_search
.
Dot file to SVG graphic:
dot -Tsvg out.dot -o out.svg
You already figured out that depht_first_search
does more than you thought it did. Instead of complicating the visitor, I'd suggest to use depht_first_visit
instead: https://www.boost.org/doc/libs/1_83_0/libs/graph/doc/depth_first_visit.html
Adjacency lists with vertex container selector vecS
have an implied contiguous integral vertex index, which doubles as the descriptor in that case. You must have been somewhat aware of this because your spelled it out:
Graph::vertex_discriptor count = num_vertices(graph);
When you remove an early vertex, you are effectively renumbering all vertices with higher index. This make it so that your removeVertexIf
loop invalidates the values inside the reachable
map.
One way to avoid this would be to go by the name property (Vertex::node
in your example). Another way is to renumber your unreachable
entries in parallel with the removal, but this breaks the encapsulation of the predicate function: the predicate now must know about the remove algorithms internals.
Another option, of course, would be to have a (temporary) extra mapping that indirects the original vertex index to the current index.
Lastly you could select a vertex container that has reference and descriptor stability (e.g. setS
and listS
).
Note that it may be much more performant to NOT REMOVE any vertices, instead just filtering them out on the fly. I'll present this as the BONUS take below
Plenty of options, let's go with the simplest:
See it Live On Coliru
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/depth_first_search.hpp>
#include <boost/graph/graphviz.hpp>
#include <iostream>
using Graph = boost::adjacency_list<boost::vecS, boost::listS, boost::directedS, //
boost::property<boost::vertex_index_t, size_t>>;
using V = Graph::vertex_descriptor;
using VSet = std::set<V>;
template <typename Pred>
requires std::predicate<Pred, V>
void removeVertexIf(Graph& g, Pred pred) {
for (auto [it, e] = vertices(g); it != e;) {
if (auto v = *it++; pred(v)) {
clear_vertex(v, g);
remove_vertex(v, g);
}
}
}
struct DFSVisitor : boost::default_dfs_visitor {
DFSVisitor(VSet& reachable) : reachable(reachable) {}
void discover_vertex(V v, Graph const& g) const {
std::cout << "Marking " << get(boost::vertex_index, g, v) << " reachable\n";
reachable.insert(v);
}
VSet& reachable;
};
void removeUnreachable(Graph& g, V start) {
VSet reachable;
std::vector<boost::default_color_type> colors(num_vertices(g));
DFSVisitor visitor(reachable);
auto idx = get(boost::vertex_index, g);
depth_first_visit(g, start, visitor, make_iterator_property_map(colors.begin(), idx));
removeVertexIf(g, [&](V v) { return !reachable.contains(v); });
}
int main() {
Graph g;
auto idx = get(boost::vertex_index, g);
{
std::istringstream input("digraph{0->1;1->2;2->3;2->6;3->4;4->5;5->8;6->7;6->5;7->8;8->9;}");
boost::dynamic_properties dp(boost::ignore_other_properties);
dp.property("node_id", idx);
read_graphviz(input, g, dp);
}
// delete everything that is not below Node 6
auto v6 = vertex(6, g);
assert(idx[v6] == 6);
removeUnreachable(g, v6);
write_graphviz(std::cout, g);
}
Of course you can make it work for vecS
iff you remove in the correct order:
template <typename Pred>
requires std::predicate<Pred, V>
void removeVertexIf(Graph& g, Pred pred) {
for (auto v : boost::adaptors::reverse(vertices(g))) {
if (pred(v)) {
clear_vertex(v, g);
remove_vertex(v, g);
}
}
}
Note though that you get the output you should expect: Live On Coliru
digraph G {
0;
1;
2;
3;
4;
0->3 ;
1->2 ;
1->0 ;
2->3 ;
3->4 ;
}
To keep the original node ids, make it explicit:
write_graphviz_dp(std::cout, g, dp);
Now printing Live On Coliru:
digraph G {
5;
6;
7;
8;
9;
5->8 ;
6->7 ;
6->5 ;
7->8 ;
8->9 ;
}
The performance of clear_vertex
and remove_vertex
is going to make you cry¹. Instead, just filter for your target vertices:
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/depth_first_search.hpp>
#include <boost/graph/filtered_graph.hpp>
#include <boost/graph/graphviz.hpp>
#include <iostream>
struct VProp { size_t original_idx; };
using Graph = boost::adjacency_list<boost::vecS, boost::vecS, boost::directedS, VProp>;
using V = Graph::vertex_descriptor;
using VSet = std::set<V>;
struct DFSVisitor : boost::default_dfs_visitor {
DFSVisitor(VSet& reachable) : reachable(reachable) {}
void discover_vertex(V v, Graph const&) const { reachable.insert(v); }
VSet& reachable;
};
VSet reachable(Graph& g, V start) {
VSet reachable;
std::vector<boost::default_color_type> colors(num_vertices(g));
DFSVisitor visitor(reachable);
depth_first_visit(g, start, visitor, colors.data());
return reachable;
}
int main() {
Graph g;
auto idx = get(&VProp::original_idx, g);
boost::dynamic_properties dp(boost::ignore_other_properties);
dp.property("node_id", idx);
{
std::istringstream input("digraph{0->1;1->2;2->3;2->6;3->4;4->5;5->8;6->7;6->5;7->8;8->9;}");
read_graphviz(input, g, dp);
}
// filter subgraph below Node 6
auto const vv = reachable(g, vertex(6, g));
write_graphviz_dp(
std::cout,
boost::filtered_graph(g, boost::keep_all{}, std::function([&vv](V v) { return vv.contains(v); })),
dp);
}
Printing, again:
digraph G {
5;
6;
7;
8;
9;
5->8 ;
6->7 ;
6->5 ;
7->8 ;
8->9 ;
}
¹ see e.g. Remove 100,000+ nodes from a Boost graph