Mod and Division by power of two for any signed number

I know the formula for positive numbers. for example given a positive integer X and any power of two Y = 2^K, we have

X / Y == X >> K
X % K == X & (Y - 1)

but what if X is negative? I couldn't find this on the internet but I need to know this. I want to do integer divisions/mod with SSE and divisors are power of two. (though it may not be known at compile time)

note: for modulus, I'm looking for the actual c++ behavior, for example -1 % 3 == -1 and -5 % 3 == -2

Looking at clang x86-64, this function

int divide(int x) {
    return x / 8;
}

compiles to

    lea     eax, [rdi + 7]
    test    edi, edi
    cmovns  eax, edi       // move edi to eax if edi is positive?
    sar     eax, 3         // i can only understand this part!
    ret

and for modulus x % 8,

    mov     eax, edi
    lea     ecx, [rax + 7]
    test    edi, edi
    cmovns  ecx, edi
    and     ecx, -8
    sub     eax, ecx
    ret

can some one explain the formula or just give reference to where this comes from? Thanks in advance.

Solution

The C++ signed modulus / division is (in my opinion) defined poorly. You get a much more natural structure if the modulus is always positive and division always rounds down to negative infinity. I guess you can blame processor manufacturers for their choice of signed division instructions.

One case where this shows up is with the signed arithmetic shift. This naturally always rounds down to negative infinity. So to emulate C++'s division, which always rounds to zero, you have to split the division operator into two cases:

If positive, we shift as normal.
If negative, we compute floor((x + d - 1) / d) to emulate ceil(x / d) such that we round towards zero.

So that explains the following:

    lea     eax, [rdi + 7]  // t = x + 7
    test    edi, edi        // if x < 0
    cmovns  eax, edi        //   x = t
    sar     eax, 3          // ret = floor(x / 8)
    ret

Now, when computing the modulo we have exactly the same scenario. In two's complement, taking the bottom k bits when doing modulo 2^k already gives the always-positive modulo. However, C++ (because of it's unfortunate choice to make division round towards zero) has a strange modulo whose sign always matches the numerator.

To simulate that we instead compute the remainder as x - round_to_zero(x / d) * d. So that's what the assembly does:

    mov     eax, edi        // orig = x
    lea     ecx, [rax + 7]  // Same as before, see above.
    test    edi, edi        
    cmovns  ecx, edi
    // sar  ecx, 3          // As an optimization we
    // shl  ecx, 3          // replace (x >> 3) << 3
    and     ecx, -8         // with x & ~((1 << 3) - 1).
    sub     eax, ecx        // ret = orig - round_to_zero(x / 8)
    ret