I have a C++ class representing a hierarchically organised data tree which is very large (~Gb, basically as large as I can get away with in memory). It uses an STL list to store information at each node plus iterators to other nodes. Each node has only one parent, but 0-10 children. Abstracted, it looks something like:
struct node {
public:
node_list_iterator parent; // iterator to a single parent node
double node_data_array[X];
map<int,node_list_iterator> children; // iterators to child nodes
};
class strategy {
private:
list<node> tree; // hierarchically linked list of nodes
struct some_other_data;
public:
void build(); // build the tree
void save(); // save the tree from disk
void load(); // load the tree from disk
void use(); // use the tree
};
I would like to implement the load() and save() to disk, and it should be fairly fast, however the obvious problems are:
I don't know the size in advance;
The data contains iterators, which are volatile;
My ignorance of C++ is prodigious.
Could anyone suggest a pure C++ solution please?
It seems like you could save the data in the following syntax:
File = Meta-data Node
Node = Node-data ChildCount NodeList
NodeList = sequence (int, Node)
That is to say, when serialized the root node contains all nodes, either directly (children) or indirectly (other descendants). Writing the format is fairly straightforward: just have a recursive write function starting at the root node.
Reading isn't that much harder. std::list<node>
iterators are stable. Once you've inserted the root node, its iterator will not change, not even when inserting its children. Hence, when you're reading each node you can already set the parent iterator. This of course leaves you with the child iterators, but those are trivial: each node is a child of its parents. So, after you've read all nodes you'll fix up the child iterators. Start with the second node, the first child (The first node one was the root) and iterate to the last child. Then, for each child C, get its parent and the child to its parent's collection. Now, this means that you have to set the int
child IDs aside while reading, but you can do that in a simple std::vector parallel to the std::list<node>
. Once you've patched all child IDs in the respective parents, you can discard the vector.