c++assemblyvisual-c++x86-64spectre

Apps built with /QSpectre-load and /CETCOMPAT crashes with EXCEPTION_STACK_OVERFLOW


Crashes when run on a Windows version and CPU that supports CET (verified on Win11 23H2, i7-1365U).
Works fine on a CPU that doesn't support CET (verified on Win11 23H2, i7-10750H).
Works fine anywhere when only one of the flags is enabled (doesn't matter which).
Verified with debugger that my code uses very little stack space, so it can't be my stack that overflowed.
Happens regardless of whether I build my app in C or C++.
Affects both VS2019 and VS2022.

Repro steps

Create an empty C++ console project.
Set the following project properties for release build:
- C/C++ > Code Generation > Spectre Mitigation > All Loads (/QSpectre-load)
- Linker > Advanced > CET Shadow Stack Compatible > Yes (/CETCOMPAT)

Use the following code:


#include <iostream>

__declspec(noinline) void Increment(size_t& num) {
    num++;
}

int main() {
    std::cout << "Running...\n";
    size_t num = 0;
    while (true) {
        Increment(num);
    }
    std::cout << num;
    return 0;
}

Build release and run on any Windows version and Intel CPU that supports CET.
Expected behaviour: program runs continuously without returning.
Actual behaviour: crashes with EXCEPTION_STACK_OVERFLOW


Solution

  • It's a Visual Studio compiler issue. Until Microsoft fixes it, your options are:

    1. use only 1 of the 2 flags
    2. downgrade /Qspectre-load to /Qspectre, which works with /CETCOMPAT

    Issue has been submitted to Microsoft: https://developercommunity.visualstudio.com/t/Apps-built-with-QSpectre-load-and-CETC/10949177

    Diagnosis

    If we look at the assembly (annotated with source code):

        num++;
    00007FF6296E1000  inc         qword ptr [rcx]  
    00007FF6296E1003  lfence  
    }
    00007FF6296E1006  pop         r11  <-------------------- ret converted to pop and jmp by /Qspectre-load
    00007FF6296E1008  lfence  
    00007FF6296E100B  jmp         r11  
    00007FF6296E100E  int         3  
    00007FF6296E100F  int         3  
    
    int main() {
    00007FF6296E1010  sub         rsp,28h  
        std::cout << "Running...\n";
    00007FF6296E1014  mov         rcx,qword ptr [__imp_std::cout (07FF6296E3080h)]  
    00007FF6296E101B  lfence  
    00007FF6296E101E  call        std::operator<<<std::char_traits<char> > (07FF6296E1040h)  
        size_t num = 0;
    00007FF6296E1023  mov         qword ptr [rsp+30h],0  
    00007FF6296E102C  nop         dword ptr [rax]  
        while (true) {
            Increment(num);
    00007FF6296E1030  lea         rcx,[num]  
    00007FF6296E1035  call        Increment (07FF6296E1000h)    <------------------------ exception address
        }
    00007FF6296E103A  jmp         main+20h (07FF6296E1030h)
    

    Exception address is as labelled above.
    Value in exception parameter #2 is the address of the top of CET shadow stack.
    Shadow stack is flooded with repeated entries of 00007FF6296E103A, which is the return address of Increment().
    Every call        Increment (07FF6296E1000h) pushes the return address onto the shadow stack but it's never popped.
    Eventually the shadow stack runs out of space and the next call crashes with EXCEPTION_STACK_OVERFLOW when it tries to push the return address onto the shadow stack.

    Address on shadow stack is supposed to be popped by a ret instruction, but /Qspectre-load converts ret into a pop and jmp, which is what happened with Increment() as labelled above.
    Hence why the shadow stack grew until it overflowed.
    This exact problem is described here: https://devblogs.microsoft.com/oldnewthing/20241015-00/?p=110374

    And this technique does not play friendly with CET: The shadow stack just grows and grows because no ret instruction is ever executed.

    A solution is touched on here: https://techcommunity.microsoft.com/blog/windowsosplatform/developer-guidance-for-hardware-enforced-stack-protection/2163340

    techniques that manually return to a previous call frame that is not the preceding call frame will also need to be shadow stack aware. In this case, it is recommended to use the _incsspq intrinsic to pop return addresses off the shadow stack so that it is in sync with the call stack.

    Documentation for /Qspectre-load states (https://learn.microsoft.com/en-us/cpp/build/reference/qspectre-load?view=msvc-170):

    Control flow instructions that load memory, including RET and CALL, are split into a load and a control flow transfer.

    ret was split into pop and jmp here, but somehow call wasn't split into push and jmp.
    If this was done then we wouldn’t have had this problem.