mathbitmicro-optimizationlow-levelinteger-division

Divide by 10 using bit shifts?


Is it possible to divide an unsigned integer by 10 by using pure bit shifts, addition, subtraction and maybe multiply? Using a processor with very limited resources and slow divide.


Solution

  • Editor's note: this is not actually what compilers do, and gives the wrong answer for large positive integers ending with 9, starting with div10(1073741829) = 107374183 not 107374182 (Godbolt). It is exact for inputs smaller than 0x40000005, though, which may be sufficient for some uses.

    Compilers (including MSVC) do use fixed-point multiplicative inverses for constant divisors, but they use a different magic constant and shift on the high-half result to get an exact result for all possible inputs, matching what the C abstract machine requires. See Granlund & Montgomery's paper on the algorithm.

    See Why does GCC use multiplication by a strange number in implementing integer division? for examples of the actual x86 asm gcc, clang, MSVC, ICC, and other modern compilers make.


    This is a fast approximation that's inexact for large inputs

    It's even faster than the exact division via multiply + right-shift that compilers use.

    You can use the high half of a multiply result for divisions by small integral constants. Assume a 32-bit machine (code can be adjusted accordingly):

    int32_t div10(int32_t dividend)
    {
        int64_t invDivisor = 0x1999999A;
        return (int32_t) ((invDivisor * dividend) >> 32);
    }
    

    What's going here is that we're multiplying by a close approximation of 1/10 * 2^32 and then removing the 2^32. This approach can be adapted to different divisors and different bit widths.

    This works great for the ia32 architecture, since its IMUL instruction will put the 64-bit product into edx:eax, and the edx value will be the wanted value. Viz (assuming dividend is passed in ecx (fastcall) and quotient returned in eax)

    div10 proc 
        mov    eax,1999999Ah    ; 1/10 * 2^32
        imul   ecx              ; edx:eax = dividend / 10 * 2 ^32
        mov    eax,edx          ; eax = dividend / 10
        ret
    endp
    

    Even on a machine with a slow multiply instruction, this will be faster than a software or even hardware divide.