c++execution-time

C++ measured execution time is nonsensical


I am trying to beat pow function of C++ by using Taylor Series, Newton's Method, continued fraction and such, the goal is to get a good approximation faster than pow.

I wrote some functions to benchmark against pow, I tried to use std::chrono::steady_clock to measure the execution time. My idea is to run the function 1048576 times and divide the duration by 1048576 to get single pass time:

#include <chrono>
#include <cmath>
#include <iostream>
#include <string>

using std::chrono::steady_clock;
using std::chrono::duration;
using std::cout;

inline float power(float base, int exp) {
    float m = 1.0;
    for (int i = 0; i < exp; i++) {
        m *= base;
    }
    return m;
}

float newton(float base, int exp, int lim) {
    float i = 1.0 / exp;
    float r = 1 + (base - 1) / exp;
    exp--;
    for (int c = 0; c < lim; c++) {
        r = i * (exp * r + base / power(r, exp));
    }
    return r;
}

int main()
{
    auto start = steady_clock::now();
    float r;
    for (int64_t i = 0; i < 1048576; i++) {
        r = newton(256.0, 3, 16);
    }
    auto end = steady_clock::now();
    duration<double, std::nano> time = end - start;
    cout << "newton(256, 3, 16): " << time.count() / 1048576 << " nanoseconds\n";
}

But the resultant measured values don't make sense at all:

PS C:\Users\Xeni> C:\Users\Xeni\source\repos\exponentiation\x64\Release\exponentiation.exe
newton(256, 3, 16): 0 nanoseconds
PS C:\Users\Xeni> C:\Users\Xeni\source\repos\exponentiation\x64\Release\exponentiation.exe
newton(256, 3, 16): 9.53674e-05 nanoseconds
PS C:\Users\Xeni> C:\Users\Xeni\source\repos\exponentiation\x64\Release\exponentiation.exe
newton(256, 3, 16): 9.53674e-05 nanoseconds
PS C:\Users\Xeni> C:\Users\Xeni\source\repos\exponentiation\x64\Release\exponentiation.exe
newton(256, 3, 16): 9.53674e-05 nanoseconds

My CPU runs at 3GHz, that is 3 billion clock cycles per second, so a clock cycle is about 3.33333333333333E-10 seconds, if the result is to be believed, then the computation happened in about 1e-4 nanoseconds, which is 1e-13 seconds, less than a clock cycle.

I compiled with Visual Studio 2022, compiler arguments:

/permissive- /ifcOutput "x64\Release\" /GS /GL /W3 /Gy /Zc:wchar_t /Zi /Gm- /O2 /Ob1 /sdl /Fd"x64\Release\vc143.pdb" /Zc:inline /fp:fast /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /std:c17 /Gd /Oi /MD /std:c++20 /FC /Fa"x64\Release\" /EHsc /nologo /Fo"x64\Release\" /Ot /Fp"x64\Release\exponentiation.pch" /diagnostics:column

I measured the execution of the whole program in PowerShell, and calculated the single pass to take about 8 nanoseconds, if PowerShell is correct:

PS C:\Users\Xeni> measure-command {C:\Users\Xeni\source\repos\exponentiation\x64\Release\exponentiation.exe}


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 8
Ticks             : 89682
TotalDays         : 1.03798611111111E-07
TotalHours        : 2.49116666666667E-06
TotalMinutes      : 0.00014947
TotalSeconds      : 0.0089682
TotalMilliseconds : 8.9682



PS C:\Users\Xeni> 8.9682/1048576*1e6
8.55274200439453

What is wrong with my code? How can I fix it?


Solution

  • You can use volatile variables to stop the compiler from optimizing away important things. You have to make sure that the the computation actually needs to be performed and that the result is actually used:

    // external linkage
    volatile float BASE = 256.0;
    volatile float r = 0.0f;
    
    int main()
    {
        auto start = steady_clock::now();
        for (int64_t i = 0; i < 1048576; i++) {
            // BASE could be anything, because it's extern + volatile
            // and r must be stored, because it's externally visible
            r = newton(BASE, 3, 16);
        }
        auto end = steady_clock::now();
        duration<double, std::nano> time = end - start;
        cout << "newton(256, 3, 16): " << time.count() / 1048576 << " nanoseconds\n";
    }