I know the formula for positive numbers. for example given a positive integer X
and any power of two Y = 2^K
, we have
X / Y == X >> K
X % K == X & (Y - 1)
but what if X is negative? I couldn't find this on the internet but I need to know this. I want to do integer divisions/mod with SSE and divisors are power of two. (though it may not be known at compile time)
note: for modulus, I'm looking for the actual c++ behavior, for example -1 % 3 == -1
and -5 % 3 == -2
Looking at clang x86-64, this function
int divide(int x) {
return x / 8;
}
compiles to
lea eax, [rdi + 7]
test edi, edi
cmovns eax, edi // move edi to eax if edi is positive?
sar eax, 3 // i can only understand this part!
ret
and for modulus x % 8
,
mov eax, edi
lea ecx, [rax + 7]
test edi, edi
cmovns ecx, edi
and ecx, -8
sub eax, ecx
ret
can some one explain the formula or just give reference to where this comes from? Thanks in advance.
The C++ signed modulus / division is (in my opinion) defined poorly. You get a much more natural structure if the modulus is always positive and division always rounds down to negative infinity. I guess you can blame processor manufacturers for their choice of signed division instructions.
One case where this shows up is with the signed arithmetic shift. This naturally always rounds down to negative infinity. So to emulate C++'s division, which always rounds to zero, you have to split the division operator into two cases:
floor((x + d - 1) / d)
to emulate ceil(x / d)
such that we round towards zero.So that explains the following:
lea eax, [rdi + 7] // t = x + 7
test edi, edi // if x < 0
cmovns eax, edi // x = t
sar eax, 3 // ret = floor(x / 8)
ret
Now, when computing the modulo we have exactly the same scenario. In two's complement, taking the bottom k bits when doing modulo 2^k already gives the always-positive modulo. However, C++ (because of it's unfortunate choice to make division round towards zero) has a strange modulo whose sign always matches the numerator.
To simulate that we instead compute the remainder as x - round_to_zero(x / d) * d
. So that's what the assembly does:
mov eax, edi // orig = x
lea ecx, [rax + 7] // Same as before, see above.
test edi, edi
cmovns ecx, edi
// sar ecx, 3 // As an optimization we
// shl ecx, 3 // replace (x >> 3) << 3
and ecx, -8 // with x & ~((1 << 3) - 1).
sub eax, ecx // ret = orig - round_to_zero(x / 8)
ret