c++boostboost-graph

Boost Graphs - deterministic graph behaviour from JSON


I'm creating graphs from a JSON array file and I'd like it so that when I traverse through both edges and vertices (e.g. in_edges(), vertices() or topological_sort() ) of the graph, it's in the exact same order regardless of the insertion order. In other words, to produce the same graph even if I shuffle the order of the JSON.

The JSON could look like this:

[
    {
        "ID": 10,
        "PARENTS": [2, 1, 85]
    },
    {
        "ID": 1,
        "PARENTS": []
    },
    {
        "ID": 99,
        "PARENTS": [2]
    },
    {
        "ID": 2,
        "PARENTS": []
    }, 
    {
        "ID": 85,
        "PARENTS": [99]
    },
]

As an example, if the insertion order was as the example JSON, 10 -> 1 -> 99 -> 2 -> 85, then the parents of 10 would come up as 1 2 85. However, if the insertion order was instead 85 -> 2 -> 1 -> 99 -> 10, then the parents of 10 would come up as 85 2 1. This is what I'd like to avoid.

Right now I'm considering adding a wrapper around a boost graph and sorting the vertices with a set when you call getParent() for example but I wanted to know if there was an easier way to do achieve this with boost natively.

Another possibility is to sort the JSON beforehand but I'd like to avoid that time complexity.


Solution

  • I'd like to avoid that time complexity

    Given the laws of thermodynamics, this cannot be avoided. You explicitly state that you want normalized/canonical results, which means you have to expend the energy to ... normalize the data, regardless of whether it's done in pre- or post-processing.

    Prefer to do it early, so the effort is not repeated.

    Remember Boost Graph is already deterministic. What you want is to treat the input in normalized fashion. Let's reproduce the issue first, given two JSON documents doc1 and doc2 from the question:

    #include <boost/json.hpp>
    namespace json = boost::json;
    
    auto opts = json::parse_options{.allow_trailing_commas=true};
    auto doc1 = json::parse(
        R"( [
        { "ID": 10, "PARENTS": [2,  1, 85] },
        { "ID": 1,  "PARENTS": []   },
        { "ID": 99, "PARENTS": [2]  },
        { "ID": 2,  "PARENTS": []   },
        { "ID": 85, "PARENTS": [99] },
    ])", {}, opts);
    
    auto doc2 = json::parse(
        R"([
         { "PARENTS": [99]       , "ID": 85  },
         { "PARENTS": []         , "ID": 2   },
         { "PARENTS": []         , "ID": 1   },
         { "PARENTS": [2]        , "ID": 99  },
         { "PARENTS": [2,  1, 85], "ID": 10  },
    ])", {}, opts);
    

    If we 1:1 translate these to boost::adjacency_list<>:

    #include <boost/graph/adjacency_list.hpp>
    using Id = int64_t;
    
    namespace Builder {
        struct Node {
            using Parents = std::vector<int64_t>;
            Id      id;
            Parents parents;
    
            friend Node tag_invoke(json::value_to_tag<Node>, json::value const& j) {
                return {j.at("ID").as_int64(), value_to<Parents>(j.at("PARENTS"))};
            }
        };
    
        auto Read(json::value const& doc) {
            boost::adjacency_list<boost::vecS, boost::listS, boost::directedS, Id> g;
    
            for (auto& [id, parents] : value_to<std::vector<Node>>(doc))
                for (auto parent : parents)
                    add_edge(parent, id, g);
    
            return g;
        }
    } // namespace Builder
    

    We indeed see the different graphs as expected: Live On Coliru

    int main() {
        std::cout << doc1 << "\n";
        std::cout << doc2 << "\n";
        std::cout << (doc1 == doc2 ? "EQUAL" : "DIFFERENT") << "\n";
    
        auto g1 = Builder::Read(doc1);
        auto g2 = Builder::Read(doc2);
    
        std::cout << g1 << "\n";
        std::cout << g2 << "\n";
    
        auto s1 = boost::lexical_cast<std::string>(g1);
        auto s2 = boost::lexical_cast<std::string>(g2);
        std::cout << (s1 == s2 ? "EQUAL" : "DIFFERENT") << "\n";
    }
    

    Printing

    [{"ID":10,"PARENTS":[2,1,85]},{"ID":1,"PARENTS":[]},{"ID":99,"PARENTS":[2]},{"ID":2,"PARENTS":[]},{"ID":85,"PARENTS":[99]}]
    [{"PARENTS":[99],"ID":85},{"PARENTS":[],"ID":2},{"PARENTS":[],"ID":1},{"PARENTS":[2],"ID":99},{"PARENTS":[2,1,85],"ID":10}]
    DIFFERENT
    10 --> 
    2 --> 10 99 
    1 --> 10 
    85 --> 10 
    99 --> 85 
    
    85 --> 10 
    99 --> 85 
    2 --> 99 10 
    10 --> 
    1 --> 10 
    
    DIFFERENT
    

    Preprocessing Nodes

    In this case we can very simply cause the nodes to be inserted in a canonical ordering of our choosing by replacing:

    for (auto& [id, parents] : value_to<std::vector<Node>>(doc))
    

    with

    for (auto& [id, parents] : value_to<std::set<Node>>(doc))
    

    For simplicity let's go with the default ordering based on each node properties:

    struct Node { /*...*/
        auto operator<=>(Node const&) const = default;
    

    Now we get Live On Coliru

    // ...
    10 --> 
    2 --> 10 99 
    1 --> 10 
    85 --> 10 
    99 --> 85 
    
    EQUAL
    

    BONUS: What About The Adjacencies

    You didn't ask, but how about the order of the parents?

    auto doc3 = json::parse(
        R"([
         { "PARENTS": [99]       , "ID": 85  },
         { "PARENTS": []         , "ID": 2   },
         { "PARENTS": []         , "ID": 1   },
         { "PARENTS": [2]        , "ID": 99  },
         { "PARENTS": [85, 2,  1], "ID": 10  },
    ])", {}, opts);
    

    Now gives

    Live On Coliru

    g2 EQUAL
    g3 DIFFERENT
    

    In pre-processing again coercing the adjacencies into an ordered container (Live):

     using Parents = std::multiset<int64_t>; // g3 EQUAL
    
    • To disallow duplicate edges, use std::set
    • In the graph model with multisetS doesn't quite do enough because it orders on the descriptor.