c++qtclassbinary-compatibility

Add member variable to class in the shared library, will not break binary compatibility?


I had wanted to find an example to express my understanding of binary compatibility, but blown it. I want to change the layout of members of class in the DLL by add members to class at the beginning or in the middle, and expect that the variable cannot be accessed correctly or accessing the variable will generate crash.However, everything goes well. I find, no matter how I add member variable to any position of class,there are no crash and not breaking binary compatibility. My code as following:

//untitled1_global.h
#include <QtCore/qglobal.h>

#if defined(UNTITLED1_LIBRARY)
#  define UNTITLED1_EXPORT Q_DECL_EXPORT
#else
#  define UNTITLED1_EXPORT Q_DECL_IMPORT
#endif
//base.h
class UNTITLED1_EXPORT Base
{
public:
    Base();

    double getA();
    double getB();

private:
    int arr[100]; //Add later to update the DLL
    double a;
    double b;
};
//derived.h
#include "dpbase.h"
class UNTITLED1_EXPORT Derived :  public Base
{
public:
    Derived();
    void setC(double d);
    double getC();

private:
    char arrCh[100]; //Add later to update the DLL
    double c;
};

Below is the client code,base.hderived.h included aren't same as in the DLL, one is annotated and one not. Implementation and declaration are separate in the DLL.I tried to access the variable directly and access the variable by funcation(such as annotated at the beginning of main.cpp).

//main.cpp
#include "dpbase.h"
#include "dpbase2.h"
#include <QDebug>

#include <QApplication>

int main(int argc, char *argv[])
{
    QApplication a(argc, argv);

    Base base;
    qDebug() << base.getA();
    qDebug() << base.getB();

    Derived base2;
    base2.setC(50);
    qDebug() << base2.getC();

    return a.exec();
}

Among them, class Base,Derived is exported from dll. No matter how I add member variable to whether Base or Derived anywhere,there are no crash and not breaking binary compatibility.

I am using qt.There is a same question here, but no help for me.

Furthermore, I delete all member var of class in the DLL, I still use nonexistent variable in the client by linking the DLL,assign value, get it...It seems that there is enough space reserved in the dynamic library to be redefined by the client, even if no member variable is defined.So strange!

My question is, why changing the layout of members of class in the DLL, will not break binary compatibility?And deleting all member var of class in the DLL but why the caller can still use members in the .h file?


Solution

  • First of all you have to understand how members are accessed.

    Let's take

    class UNTITLED1_EXPORT Base
    {
    public:
        Base();
    
        double getA();
        double getB();
    
    private:
        double a;
        double b;
    };
    

    When you access a it is like doing *(this + 0). When you access b it is like doing *(this + 8) (assuming double is 8 bytes).

    Then when you change you class like so:

    class UNTITLED1_EXPORT Base
    {
    public:
        Base();
    
        double getA();
        double getB();
    
    private:
        int arr[100];
        double a;
        double b;
    };
    

    When you access a it will do *(this + 400) and when you access b it will do *(this + 408).

    Now this may or may not be an issue. If you access a and b only through getA() and getB() and they are defined in a .cpp in your DLL, then you will update the definitions of the getters at the same time you update your class.

    You could create some weird behaviors by makign the definition of the getters inline:

    class UNTITLED1_EXPORT Base
    {
    public:
        Base();
    
        double getA() { return a; }
        double getB();
    
    private:
        double a;
        double b;
    };
    

    In this case your .exe might have its own copy of getA() and getB(). Meaning that aven after you update your DLL and add the int arr[400] your .exe will still try to access *(this + 0) which is now occupied by arr.

    This is undefined behavior, but it will not make your program crash though, as you are accessing allocated memory.

    If you do the opposite:

    1. Compile your exe with the arr
    2. Remove arr
    3. Build the DLL
    4. Run the .exe

    Then you are more likely to have a crash. Because the exe will try to access this + 400 while Base is only 16 bytes. But it is still not guaranteed to crash, for multiple reasons. For instance this + 400 might be valid. But more importantly it depends on where you allocated the memory for Base from. If you do new Base in your .exe then it will allocate 416 bytes even after your changed the DLL. But, if you do new Base in your DLL, it will allocate only 16 bytes.


    Here is an exemple.

    Here is the header for the .exe:

    class Base
    {
    public:
        Base();
        double getA() const { return a; }
        double getB() const { return b; }
        static Base *create();
        void print();
    private:
        double a;
        double b;
    };
    

    The header for the DLL

    class Base { public: Base(); double getA() const { return a; } double getB() const; static Base *create(); void print(); private: int arr[400] double a; double b; };

    The code of the DLL:

    Base::Base()
    {
        for (int i = 0 ; i < 100 ; ++i) {
            arr[i] = i;
        }
        a = 3.14;
        b = 1.42;
    }
    
    double Base::getB() const { return b; }
    
    Base *Base::create()
    {
        return new Base();
    }
    
    void Base::print()
    {
        std::cout << a << std::endl;
        std::cout << b << std::endl;
    }
    

    And the code in my exe is:

    Base *b = Base::create();
    
    std::cout << b->getA() << std::endl;
    std::cout << b->getB() << std::endl;
    b->print();
    

    Normally I should expect to have the output:

    3.14
    1.42
    3.14
    1.42
    

    But in practice I have:

    2.122e-314
    1.42
    3.14
    1.42
    

    The reason is that for the first line (2.122e-314) the exe is looking for a at address this + 0, but since this memory is now occupied by arr it is as we would have done: std::cout << *((double *)arr). The values for b are not affected because b is always read from the DLL as getB() is not inlined.

    Now if I change Base *b = Base::create(); by Base *b = new B(); the program crashes because as the new is done from the .exe it will only allocate 16 bytes while Base() will access all its member variables across 416 bytes.

    Note

    For the purpose of this answer I did pointer arithmetic supposing the increment is always 1. In reality it is not.

    So when I write this + 8 it is to be understood in C++ as reinterpret_cast<double *>(reinterpret_cast<char *>(this) + 8).