I am pulling my hair out for the last couple of days with this "innocuous" piece of code (minimal reproducible example, part of a larger modular multiplication routine):
#include <iostream>
#include <limits>
using ubigint = unsigned long long int;
using bigint = long long int;
void modmul(bigint a, bigint b, ubigint p) {
ubigint ua = a < 0 ? -a : a;
ubigint ub = b < 0 ? -b : b;
ua %= p;
ub %= p;
std::cout << "ua: " << ua << '\n';
}
int main() {
bigint minbigint = std::numeric_limits<bigint>::min();
bigint maxbigint = std::numeric_limits<bigint>::max();
std::cout << "minbigint: " << minbigint << '\n';
std::cout << "maxbigint: " << maxbigint << '\n';
modmul(minbigint, maxbigint, 2314); // expect ua: 2036, got ua: 0
}
I am compiling on macOS 11.4 with clang 12.0 installed from Homebrew
clang version 12.0.0
Target: arm64-apple-darwin20.5.0
Thread model:posix
InstalledDir: /opt/homebrew/opt/llvm/bin
When compiling with clang -O1
, the program spits out the expected result (in this case, 2036, I've checked with Wolfram Mathematica, Mod[9223372036854775808, 2314]
, and this is correct). However, when I compile with clang -O2
or clang -O3
(full optimization), somehow the variable ua
is zeroed out (its value becomes 0
). I am at a complete loss here, and have no idea why this happens. IMO, there's no UB, nor overflow, or anything dubious in this piece of code. I'd greatly appreciate any advice, or if you can reproduce the issue on your side.
PS: the code behaves as expected on any other platforms, including Windows/Linux/FreeBSD/Solaris), with any combination of compilers. I'm only getting this error on Apple M1 with clang 12 (didn't test with other compilers on M1).
UPDATE: As @harold pointed out in the comment section, negq
and subq
from 0 is exactly the same. So the my discussion related to negq
and subq
below is incorrect. Please disregard that part, sorry for not double checking before posting answer.
About the original question, I recompile a slightly simpler version of the code godbolt and find out that the problematic compiler's optimization is in main
not modmul
. In main
, clang see that all of its operands for modmul
is constant so it decided to do the computation of modmul
at compile time. When calculating ubigint ua = a < 0 ? -a : a;
, clang find out that is signed integer overflow UB so it decided to return 0 and print out. That may seem to be a radical thing to do but it's legal because of UB. Moreover, since there is no mathematically correct answer due to the limitation of two's compliment system, return 0 is arguably as good (or as bad) as any other result.
OLD ANSWER BELOWS
As some one pointed out in the comment section, the 2 lines below in your code is undefined behavior - signed integer overflow UB.
ubigint ua = a < 0 ? -a : a;
ubigint ub = b < 0 ? -b : b;
If you wonder what exactly clang does under the hood to produce 2 different results at 2 different optimization levels, consider a simple example as following.
using ubigint = unsigned long long int;
using bigint = long long int;
ubigint
negate(bigint a)
{
ubigint ua = -a;
return ua;
}
When compile with -O0
negate(long long): # @negate(long long)
pushq %rbp
movq %rsp, %rbp
movq %rdi, -8(%rbp)
xorl %eax, %eax
subq -8(%rbp), %rax # Negation is performed here
movq %rax, -16(%rbp)
movq -16(%rbp), %rax
popq %rbp
retq
Compile with -O3
negate(long long): # @negate(long long)
movq %rdi, %rax
negq %rax # Negation is performed here
retq
At -O0, clang use normal subq
instruction which perform binary subtraction of 0 and %rax and produce results with integer-wrap-around behavior.
At -O3, clang can do better, it use negq
instruction which only replace the operand with its two's complement (i.e flip all the bits and add 1). However, you can see that this optimization is only legal if signed integer overflow is undefined behavior (hence the compiler can just ignore overflow cases). If the standard required integer-wrap-around behavior, clang must fall back to the unoptimized version.