c++clang++apple-m1integer-arithmeticunsigned-long-long-int

Wrong result on modular arithmetic on ARM (Apple M1) with clang -O3 optimization


I am pulling my hair out for the last couple of days with this "innocuous" piece of code (minimal reproducible example, part of a larger modular multiplication routine):

#include <iostream>
#include <limits>

using ubigint = unsigned long long int;
using bigint = long long int;

void modmul(bigint a, bigint b, ubigint p) {
    ubigint ua = a < 0 ? -a : a;
    ubigint ub = b < 0 ? -b : b;

    ua %= p;
    ub %= p;

    std::cout << "ua: " << ua << '\n';
}

int main() {
    bigint minbigint = std::numeric_limits<bigint>::min();
    bigint maxbigint = std::numeric_limits<bigint>::max();
    std::cout << "minbigint: " << minbigint << '\n';
    std::cout << "maxbigint:  " << maxbigint << '\n';

    modmul(minbigint, maxbigint, 2314); // expect ua: 2036, got ua: 0
}

I am compiling on macOS 11.4 with clang 12.0 installed from Homebrew

clang version 12.0.0 
Target: arm64-apple-darwin20.5.0 
Thread model:posix 
InstalledDir: /opt/homebrew/opt/llvm/bin

When compiling with clang -O1, the program spits out the expected result (in this case, 2036, I've checked with Wolfram Mathematica, Mod[9223372036854775808, 2314], and this is correct). However, when I compile with clang -O2 or clang -O3 (full optimization), somehow the variable ua is zeroed out (its value becomes 0). I am at a complete loss here, and have no idea why this happens. IMO, there's no UB, nor overflow, or anything dubious in this piece of code. I'd greatly appreciate any advice, or if you can reproduce the issue on your side.

PS: the code behaves as expected on any other platforms, including Windows/Linux/FreeBSD/Solaris), with any combination of compilers. I'm only getting this error on Apple M1 with clang 12 (didn't test with other compilers on M1).


Solution

  • UPDATE: As @harold pointed out in the comment section, negq and subq from 0 is exactly the same. So the my discussion related to negq and subq below is incorrect. Please disregard that part, sorry for not double checking before posting answer.

    About the original question, I recompile a slightly simpler version of the code godbolt and find out that the problematic compiler's optimization is in main not modmul. In main, clang see that all of its operands for modmul is constant so it decided to do the computation of modmul at compile time. When calculating ubigint ua = a < 0 ? -a : a;, clang find out that is signed integer overflow UB so it decided to return 0 and print out. That may seem to be a radical thing to do but it's legal because of UB. Moreover, since there is no mathematically correct answer due to the limitation of two's compliment system, return 0 is arguably as good (or as bad) as any other result.


    OLD ANSWER BELOWS

    As some one pointed out in the comment section, the 2 lines below in your code is undefined behavior - signed integer overflow UB.

        ubigint ua = a < 0 ? -a : a;
        ubigint ub = b < 0 ? -b : b;
    

    If you wonder what exactly clang does under the hood to produce 2 different results at 2 different optimization levels, consider a simple example as following.

    using ubigint = unsigned long long int;
    using bigint = long long int;
    
    ubigint
    negate(bigint a)
    {
        ubigint ua = -a;
        return ua;
    }
    

    When compile with -O0

    negate(long long):                             # @negate(long long)
            pushq   %rbp
            movq    %rsp, %rbp
            movq    %rdi, -8(%rbp)
            xorl    %eax, %eax
            subq    -8(%rbp), %rax  # Negation is performed here
            movq    %rax, -16(%rbp)
            movq    -16(%rbp), %rax
            popq    %rbp
            retq
    

    Compile with -O3

    negate(long long):                             # @negate(long long)
            movq    %rdi, %rax
            negq    %rax  # Negation is performed here
            retq
    

    At -O0, clang use normal subq instruction which perform binary subtraction of 0 and %rax and produce results with integer-wrap-around behavior.

    At -O3, clang can do better, it use negq instruction which only replace the operand with its two's complement (i.e flip all the bits and add 1). However, you can see that this optimization is only legal if signed integer overflow is undefined behavior (hence the compiler can just ignore overflow cases). If the standard required integer-wrap-around behavior, clang must fall back to the unoptimized version.