c++c++11boost-serializationntl

Error using boost serialization with binary archive


I get the following error while reading from boost::archive::binary_iarchive into my variable:

test-serialization(9285,0x11c62fdc0) malloc: can't allocate region
*** mach_vm_map(size=18014398509486080) failed (error code=3)
test-serialization(9285,0x11c62fdc0) malloc: *** set a breakpoint in malloc_error_break to debug

My serialization and deserialization code are:

template<class Archive>
void save(Archive & archive, const helib::PubKey & pubkey, const unsigned int version){
  BOOST_TEST_MESSAGE("inside save_construct_data");
  archive << &(pubkey.context);
  archive << pubkey.skBounds;
  archive << pubkey.keySwitching;
  archive << pubkey.keySwitchMap;
  archive << pubkey.KS_strategy;
  archive << pubkey.recryptKeyID;
}

template<class Archive>
void load_construct_data(Archive & archive, helib::PubKey * pubkey, const unsigned int version){
  helib::Context * context = new helib::Context(2,3,1); //random numbers since there is no default constructor
  BOOST_TEST_MESSAGE("deserializing context");
  archive >> context;
  std::vector<double> skBounds;
  std::vector<helib::KeySwitch> keySwitching;
  std::vector<std::vector<long>> keySwitchMap;
  NTL::Vec<long> KS_strategy;
  long recryptKeyID;
  BOOST_TEST_MESSAGE("deserializing skbounds");
  archive >> skBounds;
  BOOST_TEST_MESSAGE("deserializing keyswitching");
  archive >> keySwitching;
  BOOST_TEST_MESSAGE("deserializing keyswitchmap");
  archive >> keySwitchMap;
  BOOST_TEST_MESSAGE("deserializing KS_strategy");
  archive >> KS_strategy;
  BOOST_TEST_MESSAGE("deserializing recryptKeyID");
  archive >> recryptKeyID;
  BOOST_TEST_MESSAGE("new pubkey");
  ::new(pubkey)helib::PubKey(*context);
  //TODO: complete
}

template<class Archive>
void serialize(Archive & archive, helib::PubKey & pubkey, const unsigned int version){
  split_free(archive, pubkey, version);
}

template<class Archive>
void load(Archive & archive, helib::PubKey & pubkey, const unsigned int version){
}

The test that calls the code is the following:

BOOST_AUTO_TEST_CASE(serialization_pubkey)
{
  auto context = helibTestContext();
  helib::SecKey secret_key(context);
  secret_key.GenSecKey();
  // Compute key-switching matrices that we need
  helib::addSome1DMatrices(secret_key);
  // Set the secret key (upcast: SecKey is a subclass of PubKey)
  const helib::PubKey& original_pubkey = secret_key;

  std::string filename = "pubkey.serialized";

  std::ofstream os(filename, std::ios::binary);
  {
    boost::archive::binary_oarchive oarchive(os);
    oarchive << original_pubkey;
  }

  helib::PubKey * restored_pubkey = new helib::PubKey(helib::Context(2,3,1));
  {
    std::ifstream ifs(filename, std::ios::binary);
    boost::archive::binary_iarchive iarchive(ifs);
    BOOST_TEST_CHECKPOINT("calling deserialization");
    iarchive >> restored_pubkey;
    BOOST_TEST_CHECKPOINT("done with deserialization");

    //tests ommitted
  }
}

Considerations:

  1. Serialization works both fine with boost::archive::text_oarchive and boost::archive::binary_oarchive. They create a file of 46M and 21M respectively (big, I know).

  2. Deserialization with boost::archive::text_iarchive basically stopped at the execution of archive >> keySwitching; The process gets automatically killed. This is in fact the biggest part of the archive.

  3. I decided to try with boost::archive::binary_iarchive since the file is half the size, but I get the error shown at the beginning. The error happens while executing the first read from the archive: archive >> context;.

  4. The asymmetry between input and output (save and load_construct_data) is because I could not find another way to avoid the implementation of the serialization of a derived class of helib::PubKey. Using a pointer to helib::PubKey was giving me compilation errors asking for the serialization of the derived class. If there is some other way I'm all ears.

Thank you for your help.

UPDATE:

I am implementing deserialization for some classes in the cryptographic library HElib because I need to send ciphertext over the wire. One of these classes is helib::PubKey. I'm using the boost serialization library for the implementation. I have created a gist to provide a reprex as suggested in the comments. There are 3 files:

  1. serialization.hpp, it contains the serialiation implementation. Unfortunately, helib::PubKey depends on many other classes making the file rather long. All the other classes have unit tests that pass. Furthermore, I had to make a tiny modification to the class with the goal of serializing it. I made public the private members.
  2. test-serialization.cpp, it contains the unit test.
  3. Makefile. Running make creates the executable test-serialization.

Solution

  • vector<bool> strikes again

    It's actually allocating for 0x1fffffffff20000 bits (that's 144 petabits) on my test box. That's coming directly from IndexSet::resize().

    Now I have serious questions about HElib using std::vector<bool> here (it seems they would be far better served with something like boost::icl::interval_set<>). enter image description here

    Well. That was a wild goose chase (that IndexSet serialization can be much improved). However, the real problem is that you had Undefined Behaviour because you don't deserialize the same type as you serialize.

    You serialize a PubKey, but attempt to deserialize as PubKey*. Uhoh.

    Now beyond that, there's quite a bit of problems:

    Time To Regroup

    I'd use the serialization code from HElib (because, why reinvent the wheel and make a ton of bugs doing so?). If you insist on integration with Boost Serialization, you can have your cake and eat it:

    template <class Archive> void save(Archive& archive, const helib::PubKey& pubkey, unsigned) {
        using V = std::vector<char>;
        using D = iostreams::back_insert_device<V>;
        V data;
        {
            D dev(data);
            iostreams::stream_buffer<D> sbuf(dev);
            std::ostream os(&sbuf); // expose as std::ostream
            helib::writePubKeyBinary(os, pubkey);
        }
        archive << data;
    }
    
    template <class Archive> void load(Archive& archive, helib::PubKey& pubkey, unsigned) {
        std::vector<char> data;
        archive >> data;
        using S = iostreams::array_source;
        S source(data.data(), data.size());
        iostreams::stream_buffer<S> sbuf(source);
        {
            std::istream is(&sbuf); // expose as std::istream
            helib::readPubKeyBinary(is, pubkey);
        }
    }
    

    That's all. 24 lines of code. And it's gonna be tested and maintained by the library authors. You can't beat that (clearly). I've modified the tests a bit so we don't abuse private details anymore.

    Cleaning Up The Code

    By separating out a helper to deal with the blob writing, we can implement different helib types in a very similar way:

    namespace helib { // leverage ADL
        template <class A> void save(A& ar, const Context& o, unsigned) {
            Blob data = to_blob(o, writeContextBinary);
            ar << data;
        }
        template <class A> void load(A& ar, Context& o, unsigned) {
            Blob data;
            ar >> data;
            from_blob(data, o, readContextBinary);
        }
        template <class A> void save(A& ar, const PubKey& o, unsigned) {
            Blob data = to_blob(o, writePubKeyBinary);
            ar << data;
        }
        template <class A> void load(A& ar, PubKey& o, unsigned) {
            Blob data;
            ar >> data;
            from_blob(data, o, readPubKeyBinary);
        }
    }
    

    This is elegance to me.

    FULL LISTING

    I have cloned a new gist https://gist.github.com/sehe/ba82a0329e4ec586363eb82d3f3b9326 that includes the following change-sets:

    0079c07 Make it compile locally
    b3b2cf1 Squelch the warnings
    011b589 Endof investigations, regroup time
    
    f4d79a6 Reimplemented using HElib binary IO
    a403e97 Bitwise reproducible outputs
    

    Only the last two commits contains changes related to the actual fixes.

    I'll list the full code here too for posterity. There are a number of subtle reorganizations and ditto comments in the test code. You'd do well to read through them carefully to see whether you understand them and the implications suit your needs. I left comments describing why the test assertions are what they are to help.

    TEST OUTPUT

    time ./test-serialization -l all -r detailed
    Running 1 test case...
    Entering test module "main"
    test-serialization.cpp(34): Entering test case "serialization_pubkey"
    test-serialization.cpp(61): info: check (context == surrogate) has passed
    test-serialization.cpp(70): info: check (indep_pk != original_pubkey) has passed
    test-serialization.cpp(82): info: check (restored_pubkey == original_pubkey) has passed
    test-serialization.cpp(34): Leaving test case "serialization_pubkey"; testing time: 36385217us
    Leaving test module "main"; testing time: 36385273us
    
    Test module "main" has passed with:
      1 test case out of 1 passed
      3 assertions out of 3 passed
    
      Test case "serialization_pubkey" has passed with:
        3 assertions out of 3 passed
    
    real    0m36,698s
    user    0m35,558s
    sys     0m0,850s
    

    Bitwise Reproducible Outputs

    On repeated serialization it appears that indeed the output is bitwise identical, which may be an important property:

    sha256sum pubkey.serialized*
    66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f  pubkey.serialized
    66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f  pubkey.serialized.2
    66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f  pubkey.serialized.3
    66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f  pubkey.serialized.4
    

    Note that it is (obviously) not identical across runs (because it generates different key material).

    Side Quest (The Wild Goose Chase)

    One way to improve the IndexSet serialization code manually is to also use vector<bool>:

    template<class Archive>
        void save(Archive & archive, const helib::IndexSet & index_set, const unsigned int version){
            std::vector<bool> elements;
            elements.resize(index_set.last()-index_set.first()+1);
            for (auto n : index_set)
                elements[n-index_set.first()] = true;
            archive << index_set.first() << elements;
        }
    
    template<class Archive>
        void load(Archive & archive, helib::IndexSet & index_set, const unsigned int version){
            long first_ = 0;
            std::vector<bool> elements;
            archive >> first_ >> elements;
            index_set.clear();
            for (size_t n = 0; n < elements.size(); ++n) {
                if (elements[n])
                    index_set.insert(n+first_);
            }
        }
    

    Better idea would be to use dynamic_bitset (for which I happen to have contributed the serialization code (see How to serialize boost::dynamic_bitset?)):

    template<class Archive>
        void save(Archive & archive, const helib::IndexSet & index_set, const unsigned int version){
            boost::dynamic_bitset<> elements;
            elements.resize(index_set.last()-index_set.first()+1);
            for (auto n : index_set)
                elements.set(n-index_set.first());
            archive << index_set.first() << elements;
        }
    
    template<class Archive>
        void load(Archive & archive, helib::IndexSet & index_set, const unsigned int version) {
            long first_ = 0;
            boost::dynamic_bitset<> elements;
            archive >> first_ >> elements;
            index_set.clear();
            for (size_t n = elements.find_first(); n != -1; n = elements.find_next(n))
                index_set.insert(n+first_);
        }
    

    Of course, you would likely have to do similar things for IndexMap.