[SOLVED] Why are my preprocessor-macros slower or faster than manual writing seemingly random?

Why are my preprocessor-macros slower or faster than manual writing seemingly random?

Yesterday at work, my colleague claimed that preprocessor macros were slower than writing variables and functions manually. The context is that we have a class in which member variables are sometimes added and for each of these member variables, 3 different methods have to be created in exactly the same pattern. We had these generated automatically using macros, as shown below.

struct Bar
{
    long long a;
    long long b;
    long long c;
    long long d;
};

struct Foo
{
    Bar var[1300];
};

typedef std::vector<Foo> TEST_TYPE ;

class A
{
private:
    TEST_TYPE container;

public:
    TEST_TYPE& getcontainer()
    {
        return container;
    }
};

#define createBMember(TYPE, NAME)         \
private:                                  \
    TYPE NAME;                            \
                                          \
public:                                   \
    TYPE& get##NAME()                     \
    {                                     \
        return NAME;                      \
    }

class B
{
    createBMember(TEST_TYPE, container);
};

double testA()
{
    A a;
    LARGE_INTEGER frequency;
    LARGE_INTEGER startA, endA;

    if (!QueryPerformanceFrequency(&frequency)) {
        std::cerr << "High-Resolution-Timer nicht unterstützt." << std::endl;
        return 1;
    }

    QueryPerformanceCounter(&startA);
    for(size_t i = 0; i < 10000; ++i)
    {
        a.getcontainer().push_back(Foo());
    }

    QueryPerformanceCounter(&endA);

    return static_cast<double>(endA.QuadPart - startA.QuadPart) / frequency.QuadPart;
}

double testB()
{
    B b;
    LARGE_INTEGER frequency;
    LARGE_INTEGER startB, endB;

    if (!QueryPerformanceFrequency(&frequency)) {
        std::cerr << "High-Resolution-Timer nicht unterstützt." << std::endl;
    }

    QueryPerformanceCounter(&startB);

    for(size_t i = 0; i < 10000; ++i)
    {
        b.getcontainer().push_back(Foo());
    }

    QueryPerformanceCounter(&endB);

    return static_cast<double>(endB.QuadPart - startB.QuadPart) / frequency.QuadPart;
}

//----------------------------------------------------[main]
int main()
{
    double Atest = 0;
    double Btest = 0;

    double AHigh = 0;
    double BHigh = 0;

    double ALow = 10000;
    double BLow = 10000;

    double a;
    double b;

    const uint16_t amount = 30;

    for(uint16_t i = 0; i < amount; ++i)
    {   
        a = testA();

        AHigh = a > AHigh ? a : AHigh;
        ALow = a < ALow ? a : ALow;

        Atest += a;
    }

    for(uint8_t i = 0; i < amount; ++i)
    {   
        b = testB();

        BHigh = b > BHigh ? b : BHigh;
        BLow = b < BLow ? b : BLow;

        Btest += b;
    }

    Atest /= amount; 
    Btest /= amount; 

    std::cout << "A: " << Atest << std::endl;
    std::cout << "B: " << Btest << std::endl;

    auto size = sizeof(Foo);

    return 0;
}

I tried to refute his statement with this test by having a fairly large struct, which I simply append in a vector in each test run.

The strange thing, however, was that although the preprocessor runs before compiling and both classes should therefore be identical, I measured some speed differences. The following observations were made:

In debug mode without any optimization, the class that is tested first is faster
In release mode with "whole-program-optimization" and other settings, B is faster. The last times were: A: 0.47695, B: 0.430825

This confuses me, because as I said, both classes are identical.

I should also mention that unfortunately, as far as our development environment is concerned, we have to work with a kind of snapshot version of C++11 (Visual Studio 2010). That's why I can't use std::chrono for benchmarking, for example.

I haven't been able to test it with other compilers yet. I also looked at the assembly code on godbolt.org, but didn't find anything that could make such a big difference.

Admittedly, I'm still a trainee and would classify my skills as more of an amateur. Does anyone have any idea what could be causing this difference in speed?

Solution

Your colleague doesn't know what they're talking about.

Macro's are a textual replacement in the preprocessor, one of the earliest phases of compilation. The actual compiler sees identical code. Any speed differences will be due to other factors. As noted in the comments, a bad test methodology is almost certain the best explanation (especially given the lack of knowledge shown by both the claim and the use of VS 2010)