cinteger-division128-bitint128

Divide 64-bit integers as though the dividend is shifted left 64 bits, without having 128-bit types


Apologies for the confusing title. I'm not sure how to better describe what I'm trying to accomplish. I'm essentially trying to do the reverse of getting the high half of a 64-bit multiplication in C for platforms where

int64_t divHi64(int64_t dividend, int64_t divisor) {
    return ((__int128)dividend << 64) / (__int128)divisor;
}

isn't possible due to lacking support for __int128.


Solution

  • This can be done without a multi-word division

    Suppose we want to do ⌊264 × xy⌋ then we can transform the expression like this

    Unicode math: ⌊(2^64 x)/y⌋=⌊(⌊2^64/y⌋+{2^64/y})x⌋=⌊2^64/y⌋x+⌊{2^64/y}x┤

    The first term is trivially done as ((-y)/y + 1)*x as per this question How to compute 2⁶⁴/n in C?

    The second term is equivalent to (264 % y)/y*x and is a little bit trickier. I've tried various ways but all need 128-bit multiplication and 128/64 division if using only integer operations. That can be done using the algorithms to calculate MulDiv64(a, b, c) = a*b/c in the below questions

    However they may be slow, and if you have those functions you calculate the whole expression more easily like MulDiv64(x, UINT64_MAX, y) + x/y + something without messing up with the above transformation

    Using long double seems to be the easiest way if it has 64 bits of precision or more. So now it can be done by (264 % y)/(long double)y*x

    uint64_t divHi64(uint64_t x, uint64_t y) {
        uint64_t mod_y = UINT64_MAX % y + 1;
        uint64_t result = ((-y)/y + 1)*x;
        if (mod_y != y)
            result += (uint64_t)((mod_y/(long double)y)*x);
        return result;
    }
    

    The overflow check was omitted for simplification. A slight modification will be needed if you need signed division


    If you're targeting 64-bit Windows but you're using MSVC which doesn't have __int128 then now it has a 128-bit/64-bit divide intrinsic which simplifies the job significantly without a 128-bit integer type. You still need to handle overflow though because the div instruction will throw an exception on that case

    uint64_t divHi64(uint64_t x, uint64_t y) {
        uint64_t high, remainder;
        uint64_t low = _umul128(UINT64_MAX, y, &high);
        if (x <= high /* && 0 <= low */)
            return _udiv128(x, 0, y, &remainder);
        // overflow case
        errno = EOVERFLOW;
        return 0;
    }
    

    The overflow checking above is can be simplified to checking whether x < y, because if x >= y then the result will overflow


    See also


    Exhaustive tests on 16/16 bit division shows that my solution works correctly for all cases. However you do need double even though float has more than 16 bits of precision, otherwise occasionally a less-than-one result will be returned. It may be fixed by adding an epsilon value before truncating: (uint64_t)((mod_y/(long double)y)*x + epsilon). That means you'll need __float128 (or the -m128bit-long-double option) in gcc for precise 64/64-bit output if you don't correct the result with epsilon. However that type is available on 32-bit targets, unlike __int128 which is supported only on 64-bit targets, so life will be a bit easier. Of course you can use the function as-is if just a very close result is needed

    Below is the code I've used for verifying

    #include <thread>
    #include <iostream>
    #include <limits>
    #include <climits>
    #include <mutex>
    
    std::mutex print_mutex;
    
    #define MAX_THREAD 8
    #define NUM_BITS   27
    #define CHUNK_SIZE (1ULL << NUM_BITS)
    
    // typedef uint32_t T;
    // typedef uint64_t T2;
    // typedef double D;
    typedef uint64_t T;
    typedef unsigned __int128 T2;   // the type twice as wide as T
    typedef long double D;
    // typedef __float128 D;
    const D epsilon = 1e-14;
    T divHi(T x, T y) {
        T mod_y = std::numeric_limits<T>::max() % y + 1;
        T result = ((-y)/y + 1)*x;
        if (mod_y != y)
            result += (T)((mod_y/(D)y)*x + epsilon);
        return result;
    }
    
    void testdiv(T midpoint)
    {
        T begin = midpoint - CHUNK_SIZE/2;
        T end   = midpoint + CHUNK_SIZE/2;
        for (T i = begin; i != end; i++)
        {
            T x = i & ((1 << NUM_BITS/2) - 1);
            T y = CHUNK_SIZE/2 - (i >> NUM_BITS/2);
            // if (y == 0)
                // continue;
            auto q1 = divHi(x, y);
            T2 q2 = ((T2)x << sizeof(T)*CHAR_BIT)/y;
            if (q2 != (T)q2)
            {
                // std::lock_guard<std::mutex> guard(print_mutex);
                // std::cout << "Overflowed: " << x << '&' << y << '\n';
                continue;
            }
            else if (q1 != q2)
            {
                std::lock_guard<std::mutex> guard(print_mutex);
                std::cout << x << '/' << y << ": " << q1 << " != " << (T)q2 << '\n';
            }
        }
        std::lock_guard<std::mutex> guard(print_mutex);
            std::cout << "Done testing [" << begin << ", " << end << "]\n";
    }
    
    
    uint16_t divHi16(uint32_t x, uint32_t y) {
        uint32_t mod_y = std::numeric_limits<uint16_t>::max() % y + 1;
        int result = ((((1U << 16) - y)/y) + 1)*x;
        if (mod_y != y)
            result += (mod_y/(double)y)*x;
        return result;
    }
    
    void testdiv16(uint32_t begin, uint32_t end)
    {
        for (uint32_t i = begin; i != end; i++)
        {
            uint32_t y = i & 0xFFFF;
            if (y == 0)
                continue;
            uint32_t x = i & 0xFFFF0000;
            uint32_t q2 = x/y;
            if (q2 > 0xFFFF) // overflowed
                continue;
            
            uint16_t q1 = divHi16(x >> 16, y);
            if (q1 != q2)
            {
                std::lock_guard<std::mutex> guard(print_mutex);
                std::cout << x << '/' << y << ": " << q1 << " != " << q2 << '\n';
            }
        }
    }
    
    int main()
    {
        std::thread t[MAX_THREAD];
        for (int i = 0; i < MAX_THREAD; i++)
            t[i] = std::thread(testdiv, std::numeric_limits<T>::max()/MAX_THREAD*i);
        for (int i = 0; i < MAX_THREAD; i++)
            t[i].join();
        
        std::thread t2[MAX_THREAD];
        constexpr uint32_t length = std::numeric_limits<uint32_t>::max()/MAX_THREAD;
        uint32_t begin, end = length;
        
        for (int i = 0; i < MAX_THREAD - 1; i++)
        {
            begin = end;
            end  += length;
            t2[i] = std::thread(testdiv16, begin, end);
        }
        t2[MAX_THREAD - 1] = std::thread(testdiv, end, UINT32_MAX);
        for (int i = 0; i < MAX_THREAD; i++)
            t2[i].join();
        std::cout << "Done\n";
    }