c++boostboost-serialization

How does Boost::serialization store user-defined classes in archives?


I have a user-defined object (call it Foo) which consists of some primitive variables, as well as other (external library) objects which already contain implementations of the serialize function. I would like to know how the archive files are structured, and whether that structure is general (e.g. between text archives and binary archives). When I open a text archive in a text editor, the first characters are 22 serialization::archive 19 0 0. What follows appears to be all of the data from my user-defined object.

My question is: what do those initial characters correspond to? And if I try to output the contents of my file into something which is not Foo, as follows:

bool bool1;
bool bool2;

ia >> bool1 >> bool2;

it seems to just output the first few zeros -- does it do that for any archive? And if so, is it reasonable to read out all of the member variables of Foo from that archive by just reading into the appropriate types as in:

bool bool1;
bool bool2;
double Foo_member1;
std::string Foo_member2;

ia >> bool1 >> bool2 >> Foo_member1 >> Foo_member2;

And finally, does this structure follow from any other kind of archive as well (e.g. a binary archive)?


Solution

  • it seems to just output the first few zeros

    What do you mean? You serialized two variables with indeterminate values (you never initialized them). You should not be expecting zeroes. Nor should you be expecting any particular layout (it is determined by archive type, version, library version(s) and platform architecture).

    like to know how the archive files are structured, and whether that structure is general

    It depends on the archive. It is "general" (for some definition of general). However, most archive formats include minimal meta-data describing its structure (they can not be "visited" or "traversed" in non-sequential fashion, nor is there a "reflectable" schema of some kind). If you need these kinds of serialization features, look at others (protobuf, msgpack, bson, XML/XSD, etc).

    The one archive type that will most satisfy your curiosity is the XML archive type. You will find that you are required to supply semantic information for the XML elements, e.g.

    Live On Coliru

    #include <boost/archive/xml_oarchive.hpp>
    #include <iostream>
    
    int main() {
        using boost::serialization::make_nvp;
        {
            boost::archive::xml_oarchive oa(std::cout);
    
            int a = 42, b = 99, c = a * b;
            oa << make_nvp("a", a) << make_nvp("b", b) << make_nvp("c", c);
        } // closes the archive
    }
    

    Which results in

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <!DOCTYPE boost_serialization>
    <boost_serialization signature="serialization::archive" version="19">
      <a>42</a>
      <b>99</b>
      <c>4158</c>
    </boost_serialization>
    

    This will at least lend some recognizable context to the version (19) and similar information. E.g. relating to Object Tracking and Identity, RTTI mapping and exported class keys etc.

    And if so, is it reasonable to read out all of the member variables of Foo from that archive by just reading into the appropriate types

    Yes but maybe no. The contract you have to fulfill is to (de)serialize the exact same types of data in the exact same order. Any way you manage that is fine. The typical way is to implement serialize for your type, not to serialize the members separately (Law Of Demeter and simple Encapsulation):

    Live On Coliru

    #include <boost/archive/xml_oarchive.hpp>
    
    namespace MyLib {
        struct Foo {
            bool        bool1;
            bool        bool2;
            double      Foo_member1;
            std::string Foo_member2;
        };
    
        template <typename Ar>
        void serialize(Ar& ar, Foo& foo, unsigned /*version*/)
        {
            ar& BOOST_NVP(foo.bool1) & BOOST_NVP(foo.bool2) &
                BOOST_NVP(foo.Foo_member1) & BOOST_NVP(foo.Foo_member2);
        }
    } // namespace MyLib
    
    #include <cmath> // for M_PI...
    #include <iostream>
    int main() {
        using boost::serialization::make_nvp;
        {
            boost::archive::xml_oarchive oa(std::cout);
    
            MyLib::Foo foo{true, false, M_PI, "Hello world"};
            oa << make_nvp("foo", foo);
        } // closes the archive
    }
    

    Prints

    <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
    <!DOCTYPE boost_serialization>
    <boost_serialization signature="serialization::archive" version="19">
    <foo class_id="0" tracking_level="0" version="0">
        <foo.bool1>1</foo.bool1>
        <foo.bool2>0</foo.bool2>
        <foo.Foo_member1>3.14159265358979312e+00</foo.Foo_member1>
        <foo.Foo_member2>Hello world</foo.Foo_member2>
    </foo>
    </boost_serialization>
    

    CAVEAT

    It does sound a little like you're mistaking archives for random-access data streams with some standard format. See for more Archives are not streams