c++linkerundefined-behaviorone-definition-rule

Wrong constructor being called causes segmentation fault


I have two classes that are independently declared in their own header and have their methods defined in their own TUs /.cpp's.

the classes are the same name-wise and namespace-wise, but exist in different directories in the code base, and have different functionalities, they look like the following:

core1/abc1.h:

namespace ns {
   class abc {
   ....
      std::map<std::string,int> m;  
   };
}

core1/abc1.cpp:

    namespace ns {
       abc:~abc() {
       
      };
    }

core2/abc2.h:

namespace ns {
   class abc {
   
      std::unordered_map<std::string,int> m;  
   };
}
  

core1/abc2.cpp:

namespace ns {
   abc:~abc() {
   
  };
}

When i run my program that uses both the classes, when the program ends and the destructors are being called it seems based on gdb for the instance that is of the type that comes from abc2.h, the destructor that is for abc1.h type is being called, which then causes a segfault.

I see the error when using g++ and clang, could this be an undefined behavior issue? or potentially a linker bug? (linker is gold)


when building the symbol name for use with the linker, is the filename taken into account - or it is only the namespace, class, method names the only factors taken into account?


Solution

  • What is going on here is fundamentally due to the decoration scheme (aka name mangling) which does not have its specification defined in the standard and instead is Implementation Defined, and how this name mangling scheme is being used is causing a One Definition Rule (ODR) violation.

    Given a translation unit (TU), which is a pre-processed form of implementable source code typically .cpp files, there's a mechanism the compiler uses to name specific parts of the code so that each is uniquely identifiable within the TU.

    such codes as class methods, free functions, and global variables etc.

    For example lets assume the following TU from a file called file.cpp:

    #include <...>
    #include <...>
    
    namespace stuff {
    
      bool foo(int v) {....}
      bool foo(double v) {....}
    
    }
    

    The name mangled versions of foo might look like the following (there's no common format, it's implementation defined):

    stuff_bool_foo_int
    stuff_bool_foo_double
    

    These names are attached to the code as identifying "symbols" all the way from the compiler front-end, middle layer (IR), back-end (assembler linker)

    https://en.wikipedia.org/wiki/Name_mangling

    The issue you are seeing would probably be similar to if the int version of the foo free function was defined in a similarly named namespace in another TU.

    What happens next is implementation defined, but generally speaking the following occurs in the linking stage:

    1. Linker grabs all the symbols and associated object code
    2. One by one, it adds these pairs to a map like structure eg: code_map[smybol_id] = object_code
    3. During linking when a reference to a symbol is encountered, it looks up the symbol to get the associated object_code
    4. Linker injects the object code to that reference location, if symbol is not found in the map you will see an undefined symbol error diagnostic

    The issue you're seeing is happening at step 2 - given the data structure is most likely a map and not a multi-map, the second time the symbol name/key is seen the implementation may either:

    1. Override the original value
    2. Ignore the second value

    Either way, it's guaranteed that at least one or more call sites (reference points) will have the wrong object code injected, and this is why C++ has the One-Definition-Rule (ODR)

    https://en.cppreference.com/w/cpp/language/definition

    There is no portable (across multiple compilers/linkers) way to resolve the issue other than renaming one or both of the classes such that they resolve to uniquely identifiable.

    In short you're seeing Undefined Behaviour - which means anything goes.

    The simplest solution is to rename one the classes or change the namespace of one of the classes.