Boost Graphs - deterministic graph behaviour from JSON

I'm creating graphs from a JSON array file and I'd like it so that when I traverse through both edges and vertices (e.g. in_edges(), vertices() or topological_sort() ) of the graph, it's in the exact same order regardless of the insertion order. In other words, to produce the same graph even if I shuffle the order of the JSON.

The JSON could look like this:

[
    {
        "ID": 10,
        "PARENTS": [2, 1, 85]
    },
    {
        "ID": 1,
        "PARENTS": []
    },
    {
        "ID": 99,
        "PARENTS": [2]
    },
    {
        "ID": 2,
        "PARENTS": []
    }, 
    {
        "ID": 85,
        "PARENTS": [99]
    },
]

As an example, if the insertion order was as the example JSON, 10 -> 1 -> 99 -> 2 -> 85, then the parents of 10 would come up as 1 2 85. However, if the insertion order was instead 85 -> 2 -> 1 -> 99 -> 10, then the parents of 10 would come up as 85 2 1. This is what I'd like to avoid.

Right now I'm considering adding a wrapper around a boost graph and sorting the vertices with a set when you call getParent() for example but I wanted to know if there was an easier way to do achieve this with boost natively.

Another possibility is to sort the JSON beforehand but I'd like to avoid that time complexity.

Solution

I'd like to avoid that time complexity

Given the laws of thermodynamics, this cannot be avoided. You explicitly state that you want normalized/canonical results, which means you have to expend the energy to ... normalize the data, regardless of whether it's done in pre- or post-processing.

Prefer to do it early, so the effort is not repeated.

Remember Boost Graph is already deterministic. What you want is to treat the input in normalized fashion. Let's reproduce the issue first, given two JSON documents doc1 and doc2 from the question:

#include <boost/json.hpp>
namespace json = boost::json;

auto opts = json::parse_options{.allow_trailing_commas=true};
auto doc1 = json::parse(
    R"( [
    { "ID": 10, "PARENTS": [2,  1, 85] },
    { "ID": 1,  "PARENTS": []   },
    { "ID": 99, "PARENTS": [2]  },
    { "ID": 2,  "PARENTS": []   },
    { "ID": 85, "PARENTS": [99] },
])", {}, opts);

auto doc2 = json::parse(
    R"([
     { "PARENTS": [99]       , "ID": 85  },
     { "PARENTS": []         , "ID": 2   },
     { "PARENTS": []         , "ID": 1   },
     { "PARENTS": [2]        , "ID": 99  },
     { "PARENTS": [2,  1, 85], "ID": 10  },
])", {}, opts);

If we 1:1 translate these to boost::adjacency_list<>:

#include <boost/graph/adjacency_list.hpp>
using Id = int64_t;

namespace Builder {
    struct Node {
        using Parents = std::vector<int64_t>;
        Id      id;
        Parents parents;

        friend Node tag_invoke(json::value_to_tag<Node>, json::value const& j) {
            return {j.at("ID").as_int64(), value_to<Parents>(j.at("PARENTS"))};
        }
    };

    auto Read(json::value const& doc) {
        boost::adjacency_list<boost::vecS, boost::listS, boost::directedS, Id> g;

        for (auto& [id, parents] : value_to<std::vector<Node>>(doc))
            for (auto parent : parents)
                add_edge(parent, id, g);

        return g;
    }
} // namespace Builder

We indeed see the different graphs as expected: Live On Coliru

int main() {
    std::cout << doc1 << "\n";
    std::cout << doc2 << "\n";
    std::cout << (doc1 == doc2 ? "EQUAL" : "DIFFERENT") << "\n";

    auto g1 = Builder::Read(doc1);
    auto g2 = Builder::Read(doc2);

    std::cout << g1 << "\n";
    std::cout << g2 << "\n";

    auto s1 = boost::lexical_cast<std::string>(g1);
    auto s2 = boost::lexical_cast<std::string>(g2);
    std::cout << (s1 == s2 ? "EQUAL" : "DIFFERENT") << "\n";
}

Printing

[{"ID":10,"PARENTS":[2,1,85]},{"ID":1,"PARENTS":[]},{"ID":99,"PARENTS":[2]},{"ID":2,"PARENTS":[]},{"ID":85,"PARENTS":[99]}]
[{"PARENTS":[99],"ID":85},{"PARENTS":[],"ID":2},{"PARENTS":[],"ID":1},{"PARENTS":[2],"ID":99},{"PARENTS":[2,1,85],"ID":10}]
DIFFERENT
10 --> 
2 --> 10 99 
1 --> 10 
85 --> 10 
99 --> 85 

85 --> 10 
99 --> 85 
2 --> 99 10 
10 --> 
1 --> 10 

DIFFERENT

Preprocessing Nodes

In this case we can very simply cause the nodes to be inserted in a canonical ordering of our choosing by replacing:

for (auto& [id, parents] : value_to<std::vector<Node>>(doc))

with

for (auto& [id, parents] : value_to<std::set<Node>>(doc))

For simplicity let's go with the default ordering based on each node properties:

struct Node { /*...*/
    auto operator<=>(Node const&) const = default;

Now we get Live On Coliru

// ...
10 --> 
2 --> 10 99 
1 --> 10 
85 --> 10 
99 --> 85 

EQUAL

BONUS: What About The Adjacencies

You didn't ask, but how about the order of the parents?

auto doc3 = json::parse(
    R"([
     { "PARENTS": [99]       , "ID": 85  },
     { "PARENTS": []         , "ID": 2   },
     { "PARENTS": []         , "ID": 1   },
     { "PARENTS": [2]        , "ID": 99  },
     { "PARENTS": [85, 2,  1], "ID": 10  },
])", {}, opts);

Now gives

Live On Coliru

g2 EQUAL
g3 DIFFERENT

In pre-processing again coercing the adjacencies into an ordered container (Live):

 using Parents = std::multiset<int64_t>; // g3 EQUAL

To disallow duplicate edges, use std::set

In the graph model with multisetS doesn't quite do enough because it orders on the descriptor.