cwebassemblytwos-complement

Portable implementation of computing high 64 bits in widening 128 bit multiplication


How could I implement signed and unsigned multiplication of two 64 bit numbers, widened into 128 bits, and discarding the low 64 bits, returning the high 64 bits? The multiplication must properly wrap according to twos complement, without exhibiting undefined behavior.

#include <stdint.h>

int64_t imul128hi(int64_t lhs, int64_t rhs);
uint64_t mul128hi(uint64_t lhs, uint64_t rhs);

Simple portable C with stdint.h is required, no unreasonable __builtin instrinsics. I am using this code as a stub in the output of a WASM JIT compiler. Don't use ((uint128_t)x * y) >> 64 please. The goal here is to emulate imul and mul instructions.

# printf '\x49\xf7\xe8' | ndisasm -b 64 -
00000000  49F7E8            imul r8
# printf '\x49\xf7\xe0' | ndisasm -b 64 -
00000000  49F7E0            mul r8

Solution

  • Back to long-multiplication... with base 2^32

         a    b                  // a, b, c, and d are
    x    c    d                  // 32-bit values (in a 64-bit variable)
    -----------
        ad   bd                  // all numbers here are 64-bit values
    ac  bc                       // "main" number in the low 32 bits
    -----------                  // "carry" in the high 32 bits
    ac ad+bc bd
    |   |    \---- bits 0 to 31 (with carry in bits 32 to 63)
    |   \--------- bits 32 to 63 (with carry in bits 64 to 95)
    \------------- bits 64 to 95 (with carry in bits 96 to 127)
    

    --

    uint64_t mul128hi(uint64_t lhs, uint64_t rhs) {
        uint64_t bd = (lhs & 0xffffffff) * (rhs & 0xffffffff);
        uint64_t ad = (lhs >> 32) * (rhs & 0xffffffff);
        uint64_t bc = (lhs & 0xffffffff) * (rhs >> 32);
        uint64_t ac = (lhs >> 32) * (rhs >> 32);
    
        // correction by chux incorporated, thanks
        uint64_t mid_sum = bc + (bd >> 32);  // This never overflows.
        uint64_t carry = ad > UINT64_MAX - mid_sum ? 0x100000000: 0;
        uint64_t ad_plus_bc = ad + mid_sum; // ad + mid_sum may overflow.
        return ac + (ad_plus_bc >> 32) + carry;
    }