I'm creating graphs from a JSON array file and I'd like it so that when I traverse through both edges and vertices (e.g. in_edges(), vertices() or topological_sort() ) of the graph, it's in the exact same order regardless of the insertion order. In other words, to produce the same graph even if I shuffle the order of the JSON.
The JSON could look like this:
[
{
"ID": 10,
"PARENTS": [2, 1, 85]
},
{
"ID": 1,
"PARENTS": []
},
{
"ID": 99,
"PARENTS": [2]
},
{
"ID": 2,
"PARENTS": []
},
{
"ID": 85,
"PARENTS": [99]
},
]
As an example, if the insertion order was as the example JSON, 10 -> 1 -> 99 -> 2 -> 85, then the parents of 10 would come up as 1 2 85. However, if the insertion order was instead 85 -> 2 -> 1 -> 99 -> 10, then the parents of 10 would come up as 85 2 1. This is what I'd like to avoid.
Right now I'm considering adding a wrapper around a boost graph and sorting the vertices with a set when you call getParent() for example but I wanted to know if there was an easier way to do achieve this with boost natively.
Another possibility is to sort the JSON beforehand but I'd like to avoid that time complexity.
I'd like to avoid that time complexity
Given the laws of thermodynamics, this cannot be avoided. You explicitly state that you want normalized/canonical results, which means you have to expend the energy to ... normalize the data, regardless of whether it's done in pre- or post-processing.
Prefer to do it early, so the effort is not repeated.
Remember Boost Graph is already deterministic. What you want is to treat the input in normalized fashion. Let's reproduce the issue first, given two JSON documents doc1
and doc2
from the question:
#include <boost/json.hpp>
namespace json = boost::json;
auto opts = json::parse_options{.allow_trailing_commas=true};
auto doc1 = json::parse(
R"( [
{ "ID": 10, "PARENTS": [2, 1, 85] },
{ "ID": 1, "PARENTS": [] },
{ "ID": 99, "PARENTS": [2] },
{ "ID": 2, "PARENTS": [] },
{ "ID": 85, "PARENTS": [99] },
])", {}, opts);
auto doc2 = json::parse(
R"([
{ "PARENTS": [99] , "ID": 85 },
{ "PARENTS": [] , "ID": 2 },
{ "PARENTS": [] , "ID": 1 },
{ "PARENTS": [2] , "ID": 99 },
{ "PARENTS": [2, 1, 85], "ID": 10 },
])", {}, opts);
If we 1:1 translate these to boost::adjacency_list<>
:
#include <boost/graph/adjacency_list.hpp>
using Id = int64_t;
namespace Builder {
struct Node {
using Parents = std::vector<int64_t>;
Id id;
Parents parents;
friend Node tag_invoke(json::value_to_tag<Node>, json::value const& j) {
return {j.at("ID").as_int64(), value_to<Parents>(j.at("PARENTS"))};
}
};
auto Read(json::value const& doc) {
boost::adjacency_list<boost::vecS, boost::listS, boost::directedS, Id> g;
for (auto& [id, parents] : value_to<std::vector<Node>>(doc))
for (auto parent : parents)
add_edge(parent, id, g);
return g;
}
} // namespace Builder
We indeed see the different graphs as expected: Live On Coliru
int main() {
std::cout << doc1 << "\n";
std::cout << doc2 << "\n";
std::cout << (doc1 == doc2 ? "EQUAL" : "DIFFERENT") << "\n";
auto g1 = Builder::Read(doc1);
auto g2 = Builder::Read(doc2);
std::cout << g1 << "\n";
std::cout << g2 << "\n";
auto s1 = boost::lexical_cast<std::string>(g1);
auto s2 = boost::lexical_cast<std::string>(g2);
std::cout << (s1 == s2 ? "EQUAL" : "DIFFERENT") << "\n";
}
Printing
[{"ID":10,"PARENTS":[2,1,85]},{"ID":1,"PARENTS":[]},{"ID":99,"PARENTS":[2]},{"ID":2,"PARENTS":[]},{"ID":85,"PARENTS":[99]}]
[{"PARENTS":[99],"ID":85},{"PARENTS":[],"ID":2},{"PARENTS":[],"ID":1},{"PARENTS":[2],"ID":99},{"PARENTS":[2,1,85],"ID":10}]
DIFFERENT
10 -->
2 --> 10 99
1 --> 10
85 --> 10
99 --> 85
85 --> 10
99 --> 85
2 --> 99 10
10 -->
1 --> 10
DIFFERENT
In this case we can very simply cause the nodes to be inserted in a canonical ordering of our choosing by replacing:
for (auto& [id, parents] : value_to<std::vector<Node>>(doc))
with
for (auto& [id, parents] : value_to<std::set<Node>>(doc))
For simplicity let's go with the default ordering based on each node properties:
struct Node { /*...*/
auto operator<=>(Node const&) const = default;
Now we get Live On Coliru
// ...
10 -->
2 --> 10 99
1 --> 10
85 --> 10
99 --> 85
EQUAL
You didn't ask, but how about the order of the parents?
auto doc3 = json::parse(
R"([
{ "PARENTS": [99] , "ID": 85 },
{ "PARENTS": [] , "ID": 2 },
{ "PARENTS": [] , "ID": 1 },
{ "PARENTS": [2] , "ID": 99 },
{ "PARENTS": [85, 2, 1], "ID": 10 },
])", {}, opts);
Now gives
g2 EQUAL
g3 DIFFERENT
In pre-processing again coercing the adjacencies into an ordered container (Live):
using Parents = std::multiset<int64_t>; // g3 EQUAL
- To disallow duplicate edges, use
std::set
- In the graph model with
multisetS
doesn't quite do enough because it orders on the descriptor.