c++serializationboostboost-serialization

Boost serialization: add new register_type without breaking compatibility


I use Boost to serialize a NeuralNetwork object with code like this

template <class Archive>
void NeuralNetwork::serialize(Archive& ar, unsigned version)
{
    boost::serialization::void_cast_register<NeuralNetwork, StatisticAnalysis>();
    ar & boost::serialization::base_object<StatisticAnalysis>(*this);
    ar.template register_type<FullyConnected>(); // derived from Layer object
    ar.template register_type<Recurrence>();
    ar.template register_type<Convolution>();
    ar.template register_type<MaxPooling>();
    ar & layers; // vector<unique_ptr<Layer>>
}

My problem is that I have objects already serialized and when I add a new class inherited from Layer, I have the following error: unknown file: error: C++ exception with description "unregistered class" thrown in the test body.

How can I add a new register_type<T> without breaking compatibility with already serialized and saved objects?


Solution

  • when I add a new class inherited from Layer, I have the following error: unknown file: error: C++ exception with description "unregistered class" thrown in the test body.

    I'd argue that this is due to other things.

    Reference Point: "Automatic" type registration

    The typical pattern is NOT to use register_type. Instead you'd use the automatic registration mechanism: https://www.boost.org/doc/libs/1_32_0/libs/serialization/doc/special.html#registration

    #include <boost/archive/text_oarchive.hpp>
    #include <boost/archive/text_iarchive.hpp>
    #include <boost/serialization/base_object.hpp>
    #include <boost/serialization/export.hpp>
    #include <boost/serialization/unique_ptr.hpp>
    #include <boost/serialization/access.hpp>
    #include <boost/serialization/vector.hpp>
    #include <sstream>
    #include <iostream>
    #include <iomanip>
    #include <boost/core/demangle.hpp>
    using boost::serialization::base_object;
    using boost::core::demangle;
    
    struct StatisticAnalysis {
        virtual ~StatisticAnalysis() = default;
        virtual void report(std::ostream&) const = 0;
        std::vector<int> base_data {1,2,3};
        void serialize(auto& ar, unsigned) { ar & base_data; }
    
        friend std::ostream& operator<<(std::ostream& os, StatisticAnalysis const& sa) {
            sa.report(os);
            return os;
        }
    };
    
    BOOST_SERIALIZATION_ASSUME_ABSTRACT(StatisticAnalysis)
    BOOST_CLASS_EXPORT(StatisticAnalysis)
    
    struct Layer {
        virtual ~Layer() = default;
        void serialize(auto&, unsigned) { }
    };
    
    BOOST_SERIALIZATION_ASSUME_ABSTRACT(Layer)
    BOOST_CLASS_EXPORT(Layer)
    
    struct FullyConnected : Layer { void serialize(auto &ar, unsigned) { ar &base_object<Layer>(*this); } };
    struct Recurrence     : Layer { void serialize(auto &ar, unsigned) { ar &base_object<Layer>(*this); } };
    struct Convolution    : Layer { void serialize(auto &ar, unsigned) { ar &base_object<Layer>(*this); } };
    struct MaxPooling     : Layer { void serialize(auto &ar, unsigned) { ar &base_object<Layer>(*this); } };
    
    BOOST_CLASS_EXPORT(FullyConnected)
    BOOST_CLASS_EXPORT(Recurrence)
    BOOST_CLASS_EXPORT(Convolution)
    BOOST_CLASS_EXPORT(MaxPooling)
    
    #if defined(VERSION2)
    struct NewLayer : Layer {
        void serialize(auto &ar, unsigned) { ar &base_object<Layer>(*this); }
    };
    BOOST_CLASS_EXPORT(NewLayer)
    #endif
    
    struct NeuralNetwork : StatisticAnalysis {
        virtual void report(std::ostream& os) const override {
            os << layers.size() << " layers: {";
            for (auto& layer : layers) {
                os << " " << demangle(typeid(*layer).name());
            }
            os << " }\n";
        }
    
        std::vector<std::unique_ptr<Layer> > layers;
    
        void serialize(auto& ar, unsigned) {
            ar &base_object<StatisticAnalysis>(*this);
            ar &layers;
        }
    };
    
    BOOST_CLASS_EXPORT(NeuralNetwork)
    
    int main()
    {
        std::unique_ptr<StatisticAnalysis> analysis;
        std::stringstream ss;
        {
            boost::archive::text_oarchive oa(ss);
            analysis = [] {
                auto nn = std::make_unique<NeuralNetwork>();
                nn->layers.emplace_back(std::make_unique<FullyConnected>());
                nn->layers.emplace_back( std::make_unique<Recurrence>());
                nn->layers.emplace_back(std::make_unique<Convolution>());
                nn->layers.emplace_back(std::make_unique<FullyConnected>());
                nn->layers.emplace_back(std::make_unique<FullyConnected>());
                nn->layers.emplace_back(std::make_unique<MaxPooling>());
                return nn;
            }();
            oa << analysis;
        }
    
        std::cout << "Data: " << std::quoted(ss.str()) << "\n";
    
        {
            boost::archive::text_iarchive ia(ss);
    
            analysis.reset();
            ia >> analysis;
            
            std::cerr << *analysis << "\n";
        }
    
    }
    

    Both versions have identical archives:

    Data: "22 serialization::archive 17 0 0 1 13 NeuralNetwork 1 0
    0 0 0 3 0 1 2 3 0 0 6 0 0 0 7 14 FullyConnected 1 0
    1 1 0
    2 8 10 Recurrence 1 0
    3
    4 9 11 Convolution 1 0
    5
    6 7
    7
    8 7
    9
    10 10 10 MaxPooling 1 0
    11
    12
    "
    6 layers: { FullyConnected Recurrence Convolution FullyConnected FullyConnected MaxPooling }
    

    Compare With register_type

    Just making sure that register_type doesn't actually create a compatibility problem - as the docs could be implying indeed:

    Note that if the serialization function is split between save and load, both functions must include the registration. This is required to keep the save and corresponding load in syncronization.

    Note: After seeing that the output was identical as expected for text archives, I also modified to write to a binary archive, just in case there are some implementation differences at play there.

    Live Demo (both versions v1 and v2 at once):

    //#define VERSION2
    #include <boost/archive/binary_oarchive.hpp>
    #include <boost/archive/binary_iarchive.hpp>
    #include <boost/serialization/base_object.hpp>
    #include <boost/serialization/export.hpp>
    #include <boost/serialization/unique_ptr.hpp>
    #include <boost/serialization/access.hpp>
    #include <boost/serialization/vector.hpp>
    #include <fstream>
    #include <iostream>
    #include <iomanip>
    #include <boost/core/demangle.hpp>
    using boost::serialization::base_object;
    using boost::core::demangle;
    
    struct StatisticAnalysis {
        virtual ~StatisticAnalysis() = default;
        virtual void report(std::ostream&) const = 0;
        std::vector<int> base_data {1,2,3};
        void serialize(auto& ar, unsigned) { ar & base_data; }
    
        friend std::ostream& operator<<(std::ostream& os, StatisticAnalysis const& sa) {
            sa.report(os);
            return os;
        }
    };
    
    BOOST_SERIALIZATION_ASSUME_ABSTRACT(StatisticAnalysis)
    BOOST_CLASS_EXPORT(StatisticAnalysis)
    
    struct Layer {
        virtual ~Layer() = default;
        void serialize(auto&, unsigned) { }
    };
    
    BOOST_SERIALIZATION_ASSUME_ABSTRACT(Layer)
    BOOST_CLASS_EXPORT(Layer)
    
    struct FullyConnected : Layer { void serialize(auto &ar, unsigned) { ar &base_object<Layer>(*this); } };
    struct Recurrence     : Layer { void serialize(auto &ar, unsigned) { ar &base_object<Layer>(*this); } };
    struct Convolution    : Layer { void serialize(auto &ar, unsigned) { ar &base_object<Layer>(*this); } };
    struct MaxPooling     : Layer { void serialize(auto &ar, unsigned) { ar &base_object<Layer>(*this); } };
    
    //BOOST_CLASS_EXPORT(FullyConnected)
    //BOOST_CLASS_EXPORT(Recurrence)
    //BOOST_CLASS_EXPORT(Convolution)
    //BOOST_CLASS_EXPORT(MaxPooling)
    
    #if defined(VERSION2)
    struct NewLayer : Layer {
        void serialize(auto &ar, unsigned) { ar &base_object<Layer>(*this); }
    };
    //BOOST_CLASS_EXPORT(NewLayer)
    #endif
    
    struct NeuralNetwork : StatisticAnalysis {
        virtual void report(std::ostream& os) const override {
            os << layers.size() << " layers: {";
            for (auto& layer : layers) {
                os << " " << demangle(typeid(*layer).name());
            }
            os << " }\n";
        }
    
        std::vector<std::unique_ptr<Layer> > layers;
    
        void serialize(auto& ar, unsigned) {
            ar &base_object<StatisticAnalysis>(*this);
            ar.template register_type<FullyConnected>(); // derived from Layer object
            ar.template register_type<Recurrence>();
            ar.template register_type<Convolution>();
            ar.template register_type<MaxPooling>();
    #if defined(VERSION2)
            ar.template register_type<NewLayer>();
    #endif
    
            ar &layers;
        }
    };
    
    BOOST_CLASS_EXPORT(NeuralNetwork)
    
    int main(int, char **argv) {
        std::string program_name(*argv);
    
        std::unique_ptr<StatisticAnalysis> analysis;
        {
            std::ofstream ofs(program_name + ".bin", std::ios::binary);
            boost::archive::binary_oarchive oa(ofs);
            analysis = [] {
                auto nn = std::make_unique<NeuralNetwork>();
                nn->layers.emplace_back(std::make_unique<FullyConnected>());
                nn->layers.emplace_back( std::make_unique<Recurrence>());
                nn->layers.emplace_back(std::make_unique<Convolution>());
                nn->layers.emplace_back(std::make_unique<FullyConnected>());
                nn->layers.emplace_back(std::make_unique<FullyConnected>());
                nn->layers.emplace_back(std::make_unique<MaxPooling>());
                return nn;
            }();
            oa << analysis;
        }
    
        {
            std::ifstream ifs(program_name + ".bin", std::ios::binary);
            boost::archive::binary_iarchive ia(ifs);
    
            analysis.reset();
            ia >> analysis;
            
            std::cerr << *analysis << "\n";
        }
    }
    

    The test commands

    g++ -std=c++20 -Os -DVERSION1 -lboost_serialization main.cpp -o v1
    g++ -std=c++20 -Os -DVERSION2 -lboost_serialization main.cpp -o v2
    ./v1 && ./v2 && md5sum v1.bin v2.bin
    

    Complete successfully, writing identical archives v1.bin and v2.bin, as evidenced by their md5sums:

    5bba3ef7d8a25bd50d0768fed5dfed64  v1.bin
    5bba3ef7d8a25bd50d0768fed5dfed64  v2.bin
    

    Summary - Where To Go From Here

    I think that adding subclasses in principle should not have to break archive compatibility. If it appears it did,

    I'm here should you encounter some more information. If the question becomes different enough, consider opening a new question.