Compiler Optimization Causing Incorrect Results

I am doing development on a Zynq Ultrascale+ ARM core, and I am having a weird issue where compiler optimization being turned on causes a loop to be prematurely broken out of. The compiler is "aarch64-none-elf-g++".

Here is the code that is code. Originally, the program was more complicated, but in debugging, I removed as much as I could while still reproducing the bug.

#include <cstdint>
#include <string>
#include <vector>

class TestClass
{
private:
    std::vector<int> command_buffer;
public:
    TestClass() : command_buffer({}) {}
    void read_command();
};

void TestClass::read_command()
{
    printf("Before %lu\n\r", 67ul);

    size_t s = 42;
    while (true)
    {
        s = this->command_buffer.size();

        if (s > 0)
        {
            break;
        }
    }

    printf("Size %lu:\n\r", s);
}

int main()
{
    TestClass tester{ };
    tester.read_command();
}

Basically, I expect the program to loop continuously, checking the size of a vector until it is non-zero. Since I don't ever add any elements to the vector, it should loop forever. When the optimization level is set to -O0, this works as expected. However, setting it to -O2 causes the program to break out of the loop immediately and print the size of the vector, which is 0.

The console output without optimization (-O0):

Before: 67

The console output with optimization (-O2):

Before: 67
After: 0

Looking at the assembly code generated by the compiler, it looks like the size of the vector is calculated but never checked, it is just assumed to be non-zero.

.LC0:
    .string "Before: %lu\n\r"
    .align  3
.LC1:
    .string "After: %lu\n\r"
    .text
    .align  2
    .p2align 4,,11
    .global _ZN9TestClass12read_commandEv
    .type   _ZN9TestClass12read_commandEv, %function
_ZN9TestClass12read_commandEv:
.LFB2157:
    .cfi_startproc
    stp x29, x30, [sp, -32]!
    .cfi_def_cfa_offset 32
    .cfi_offset 29, -32
    .cfi_offset 30, -24
    adrp    x1, .LC0
    mov x29, sp
    str x19, [sp, 16]
    .cfi_offset 19, -16
    mov x19, x0
    add x0, x1, :lo12:.LC0
    mov x1, 67
    bl  printf
    ldp x2, x1, [x19]
    adrp    x0, .LC1
    ldr x19, [sp, 16]
    add x0, x0, :lo12:.LC1
    ldp x29, x30, [sp], 32
    .cfi_restore 30
    .cfi_restore 29
    .cfi_restore 19
    .cfi_def_cfa_offset 0
    sub x1, x1, x2
    asr x1, x1, 2
    b   printf
    .cfi_endproc

Any ideas as to why this is happening?

Solution

The code has undefined behavior.

Under the current specification in the C++ standard, a compiler is allowed to assume (e.g. for optimization purposes) that a thread will always eventually either:

terminate,
call a library IO function,
evaluate a volatile access,
evaluate an atomic operation, or
evaluate a synchronization operation

Your loop, if it doesn't bail out in the first iteration, never performs any of these actions and it also can't ever terminate under in this scenario, because no other thread can ever modify the vector without causing undefined behavior due to a data race.

So, the compiler can assume that the loop condition must not be satisfied the first time it is checked, so that the loop can be completely optimized away.

Even worse, it could see that your vector is indeed guaranteed to be empty initially, meaning that the program has guaranteed undefined behavior and could then replace the loop with e.g. a trap to indicate the impossible path being taken.

Note that this is somewhat different in C, where the compiler is not allowed to assume that such a loop terminates if its condition is an integral constant which evaluates to 1. There are currently proposals under consideration in the C++ standard committee to change the C++ rules in a similar way.

However, such a loop is usually an indication for a more fundamental problem. What would be the point of checking the loop condition each time when it can't possibly be satisfied in a later loop iteration?

If you assume that another thread will modify the vector's size, then that is your actual problem. std::vector is not thread-safe. Any attempt to modify it from another thread while the shown thread executes the loop will be a data race and cause undefined behavior.

You would need to add a mutex or some other synchronization mechanism, which would then automatically assure that the forward-progress assumptions above are satisfied.

The only real use case for an infinite loop of this kind that doesn't satisfy the assumptions is, as far as I can tell, to indefinitely halt a thread. In that case it would however simply be written as while(true); and even that only makes sense freestanding, since otherwise a loop over std::this_thread::sleep_for or std::this_thread::yield would likely be more appropriate. (Although these do not count as any of the progress actions in the list at the top either.)

In fact I missed that the changes for trivial infinite loops have already been merged into the draft for C++26.

With these changes in C++26 the list of actions above is extended to include calls to std::this_thread::yield.

Furthermore certain specific loop constructs, such as while(true); that I mentioned above, are now termed trivial infinite loops and are also added to the list and don't have undefined behavior anymore. Notably, in contrast to C, this applies only to loops with empty iteration statements.

Except on freestanding implementations (where is is implementation-defined) trivial infinite loops are replaced with a loop that has the same behavior as while(true) std::this_thread::yield();.