c++c++11serializationcereal

cereal serialization and polymorphism


Ok so I'm running into a problem in C++11 with cereal (http://uscilab.github.io/cereal/).

In an abstract sense I have a large graph which I am serializing with lots of shared pointers connecting the edges and vertices. Edges (and vertices) also have attributes attached to them.

Now one of these attributes (base class) is an account (child class). Account also inherits from Idable which also is serializable. Now here are some pertinent snips of code which show some of my cereal usage. I'll explain the issue after this context:

Attribute.hpp/cpp

class Attribute {
...

template<class Archive> void serialize(Archive&)
{   
}   

friend class cereal::access;
...

CEREAL_REGISTER_TYPE(mgraph::Attribute)

Idable.hpp/cpp

class Idable {
...

Id id;

template<class Archive> void serialize(Archive& archive)
{
    archive(cereal::make_nvp("id", id)); 
}

template<class Archive> static void load_and_construct(Archive& ar, cereal::construct<mcommon::Idable>& construct)
{
    mcommon::Id id;
    ar(id);
    construct(id);
}

friend class cereal::access;
...

CEREAL_REGISTER_TYPE(mcommon::Idable)

Position.hpp/cpp

class Position
: public mgraph::Attribute
  , public mcommon::Displayable {

template<class Archive> void serialize(Archive& archive)
{   
    archive(cereal::make_nvp("Attribute",
                             cereal::base_class<mgraph::Attribute>(this)));
}   

friend class cereal::access;
...

CEREAL_REGISTER_TYPE(mfin::Position)

Account.hpp/cpp

class Account
: public mcommon::Idable
  , public Position {
...
Currency balance;

template<class Archive> void serialize(Archive& archive)
{   
    archive(cereal::make_nvp("Idable",
                             cereal::base_class<mcommon::Idable>(this)),
            cereal::make_nvp("Position",
                             cereal::base_class<mfin::Position>(this)),
            cereal::make_nvp("balance", balance));
}

template<class Archive> static void load_and_construct(Archive& ar, cereal::construct<Account>& construct)
{
    mcommon::Id iden;
    Currency::Code code;
    ar(iden, code);
    construct(iden, code);
}

friend class cereal::access;
...

CEREAL_REGISTER_TYPE(mfin::Account)

So the problem comes when a mfin::Account is being serialized. The mfin::Account belongs to a std::list>. When we get down into the serialize function for Idable the object is invalid.

Going into gdb which halts on a segfault I go up a few stackframes to this this line: /usr/include/cereal/types/polymorphic.hpp:341. Which is:

(gdb) list
336 
337     auto binding = bindingMap.find(std::type_index(ptrinfo));
338     if(binding == bindingMap.end())
339       UNREGISTERED_POLYMORPHIC_EXCEPTION(save, cereal::util::demangle(ptrinfo.name()))
340 
341     binding->second.shared_ptr(&ar, ptr.get());
342   }
343 
344   //! Loading std::shared_ptr for polymorphic types
345   template <class Archive, class T> inline

Now here this is what ptr is:

(gdb) print *((mfin::Account*)(ptr.get()))
$10 = {<mcommon::Idable> = {_vptr.Idable = 0x4f0d50 <vtable for mfin::Account+16>, id = "bank"}, <mfin::Position> = {<mgraph::Attribute> = {
      _vptr.Attribute = 0x4f0d78 <vtable for mfin::Account+56>}, <mcommon::Displayable> = {_vptr.Displayable = 0x4f0da0 <vtable for mfin::Account+96>}, <No data fields>}, balance = {<mcommon::Displayable> = {
      _vptr.Displayable = 0x4f0570 <vtable for mfin::Currency+16>}, amount = 0, code = mfin::Currency::USD}}
(gdb) print ptr
$11 = std::shared_ptr (count 3, weak 0) 0x758ad0

Everything is looking good. But notice when I cast it to a void*:

$11 = std::shared_ptr (count 3, weak 0) 0x758ad0
(gdb) print *((mfin::Account*)((void*)ptr.get()))
$12 = {<mcommon::Idable> = {_vptr.Idable = 0x4f0d78 <vtable for mfin::Account+56>, 
    id = "\363aL\000\000\000\000\000PbL\000\000\000\000\000\304\031L\000\000\000\000\000\021#L", '\000' <repeats 13 times>, " \232N", '\000' <repeats 21 times>, "P\251@\000\000\000\000\000\370\377\377\377\377\377\377\377 \232N", '\000' <repeats 21 times>, "\304\031L\000\000\000\000\000P\251@", '\000' <repeats 45 times>, "St19_Sp_counted_deleterIPN4mfin7AccountE"...}, <mfin::Position> = {<mgraph::Attribute> = {
      _vptr.Attribute = 0x4f0570 <vtable for mfin::Currency+16>}, <mcommon::Displayable> = {_vptr.Displayable = 0x0}, <No data fields>}, balance = {<mcommon::Displayable> = {_vptr.Displayable = 0x0}, amount = 49, 
    code = (unknown: 7702648)}}

This is of course what happens in binding->second.shared_ptr (seen below) which takes a const void*.

(gdb) list
295             writeMetadata(ar);
296 
297             #ifdef _MSC_VER
298             savePolymorphicSharedPtr( ar, dptr, ::cereal::traits::has_shared_from_this<T>::type() ); // MSVC doesn't like typename here
299             #else // not _MSC_VER
300             savePolymorphicSharedPtr( ar, dptr, typename ::cereal::traits::has_shared_from_this<T>::type() );
301             #endif // _MSC_VER
302           };
303 
304         serializers.unique_ptr =

What is wrong in my usage of cereal that would cause this? Here is the final error I get:

Program received signal SIGSEGV, Segmentation fault.
0x000000000040f7cd in rapidjson::Writer<rapidjson::GenericWriteStream, rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >::WriteString (this=0x7fffffffd358, 
    str=0x4f1ae0 <vtable for mfin::Account+96> "\363aL", length=4989722) at /usr/include/cereal/external/rapidjson/writer.h:276
276             if ((sizeof(Ch) == 1 || characterOk(*p)) && escape[(unsigned char)*p])  {
Missing separate debuginfos, use: debuginfo-install boost-date-time-1.55.0-8.fc21.x86_64 boost-filesystem-1.55.0-8.fc21.x86_64 boost-program-options-1.55.0-8.fc21.x86_64 boost-system-1.55.0-8.fc21.x86_64 boost-thread-1.55.0-8.fc21.x86_64 fcgi-2.4.0-24.fc21.x86_64 glog-0.3.3-3.128tech.x86_64 libgcc-4.9.2-1.fc21.x86_64 libstdc++-4.9.2-1.fc21.x86_64

Solution

  • Ok after much investigation I believe I have the answer to my problem. And I believe this is a bug in the library. After I have confirmed this with the library owners I will ensure this is up to date with the results there.

    I have produced a simple program below which demonstrates this problem. The issue stems from multiple inheritance, polymorphism, and casting. In the program below notice were we create a Derived object. The Derived object when laid out in memory will have a format approximately.:

    Derived:
      Base2::vtable
      Base2::var
      Base::vtable
    

    Consider:

    (gdb) print ptr
    $2 = std::shared_ptr (count 1, weak 0) 0x63c580
    (gdb) print *ptr
    $3 = (Derived &) @0x63c580: {<Base2> = {_vptr.Base2 = 0x421f90 <vtable for Derived+16>, var = ""}, <Base> = {_vptr.Base = 0x421fa8 <vtable for Derived+40>}, <No data fields>}
    

    Now when we dynamic_pointer_cast it to Base we have:

    (gdb) print ptr
    $8 = std::shared_ptr (count 2, weak 0) 0x63c590
    (gdb) print *ptr
    $9 = (Base &) @0x63c590: {_vptr.Base = 0x421fa8 <vtable for Derived+40>}
    

    This is where the problem begins. Now on /usr/include/cereal/types/polymorphic.hpp, line 341. We have this ptr to Base. Here we have:

    binding->second.shared_ptr(&ar, ptr.get());
    

    Which ends up being a cast to a const void*. Later on based on the type info however we cast this the type from the registered polymorphic type. Since the shared_ptr points to an object of the Derived type this means a Derived*. As seen below:

    272       static inline void savePolymorphicSharedPtr( Archive & ar, void const * dptr, std::false_type /* has_shared_from_this */ )
    273       {
    274         PolymorphicSharedPointerWrapper psptr( dptr );
    275         ar( CEREAL_NVP_("ptr_wrapper", memory_detail::make_ptr_wrapper( psptr() ) ) );
    276       }
    

    Now this means down the stack ptr which is a Base* was cast to void* and then cast to Derived*. And thus the cast chain results in an invalid object. As seen below this the ptr is invalid now:

    (gdb) print *ptr
    $7 = (const Derived &) @0x63c590: {<Base2> = {_vptr.Base2 = 0x421fa8 <vtable for Derived+40>, var = <error reading variable: Cannot access memory at address 0x49>}, <Base> = {_vptr.Base = 0x0}, <No data fields>}
    

    The pointer is pointing to the vtable for Base and not Derived/Base2 like it should be thus the program crashes:

    {
        "ptr": {
            "polymorphic_id": 2147483649,
            "polymorphic_name": "Derived",
            "ptr_wrapper": {
                "id": 2147483649,
                "data": {
                    "Base2": {
    
    Program received signal SIGSEGV, Segmentation fault.
    0x00007ffff7b8e9e3 in std::string::size() const () from /lib64/libstdc++.so.6
    

    Below is a sample program which reproduces this:

    // g++ test.cpp -std=c++11 -ggdb -o test && gdb ./test
    #include <cereal/archives/json.hpp>
    #include <cereal/types/polymorphic.hpp>
    #include <iostream>
    
    struct Base {
        virtual void foo() { } 
        template<class Archive> void serialize(Archive& archive) { } 
    };
    
    struct Base2 {
        virtual void foo() { } 
        std::string var;
        template<class Archive> void serialize(Archive& archive) {   
            archive(cereal::make_nvp("var", var));
        }   
    };
    
    struct Derived : public Base2, public Base {
        template<class Archive> void serialize(Archive& archive) {   
            archive(cereal::make_nvp("Base2",
                                     cereal::base_class<Base2>(this)),
                    cereal::make_nvp("Base",
                                     cereal::base_class<Base>(this)));
        }   
    };
    
    CEREAL_REGISTER_TYPE(Base);
    CEREAL_REGISTER_TYPE(Base2);
    CEREAL_REGISTER_TYPE(Derived);
    
    int main() {
        auto ptr = std::make_shared<Derived>();
        cereal::JSONOutputArchive ar(std::cout);
        ar(cereal::make_nvp("ptr", std::dynamic_pointer_cast<Base>(ptr)));
    
        return 0;
    }