I want to divide uint64 by a uint32 in WebGPU, which currently only handles uint32 operations. So I am trying to simulate this by storing my long into two uint32 buffers high
and low
. and now I want to divide them by a uint32 b
I saw this answer, but I am having a hard time replicating it. I think for it to work, I should do (A / Y)x3 + (((A % Y)x + B) / Y)x2 + (((((A % Y)x + B) % Y)x + C) / Y)x + ((((((A % Y)x + B) % Y)x + C) / Y)x + D) / Y where x is 216. With that, A is the first 16 bits of high
, and B is the lowest 16 bits of high
. Similarly, C is the first 16 bits of low
and D is the last 16 bits of low
.
Here's a C code of the algorithm:
void u64DivU32(uint32_t& low, uint32_t& high, uint32_t& Y, uint32_t& res_low, uint32_t& res_high) {
uint32_t a, b, c, d;
a = high >> 16;
b = (high << 16) >> 16;
c = (low >> 16);
d = (low << 16) >> 16;
// compute high
uint32_t t0 = a; // A
uint32_t t1 = (((t0 % Y) << 16) + b); // ((A % Y)x + B)
// (A / Y)x3 + (((A % Y)x + B) / Y)x2
res_high = ((t0 / Y) << 16) + (t1/Y);
// compute low
uint32_t t2 = (((t1 % Y) << 16) + c); // (((((A % Y)x + B) % Y)x + C) / Y)x
uint32_t t3 = (((t2 / Y) << 16) + d); // ((((((A % Y)x + B) % Y)x + C) / Y)x + D) / Y
res_low = ((t2 / Y) << 16) + (t3/Y);
}
However, it did not work. After a few tests, res_high
is working as intended, but res_low
is always off. I am also not sure how to apply this correctly when b is larger than 65535
I see two issues.
First, you reused b
, but that must have just been when entering the question as you wouldn't get anywhere close.
The 2nd error is you put in a / instead of % for t3
calculation
void u64DivU32(uint32_t& low, uint32_t& high, uint32_t& divisor, uint32_t& res_low, uint32_t& res_high) {
uint32_t a, b, c, d;
a = high >> 16;
b = (high << 16) >> 16;
c = (low >> 16);
d = (low << 16) >> 16;
// compute high
uint32_t t0 = a; // A
uint32_t t1 = (((t0 % divisor) << 16) + b); // ((A % Y)x + B)
// (A / Y)x3 + (((A % Y)x + B) / Y)x2
res_high = ((t0 / divisor) << 16) + (t1/divisor);
// compute low
uint32_t t2 = (((t1 % divisor) << 16) + c);
uint32_t t3 = (((t2 % divisor) << 16) + d);
res_low = ((t2 / divisor) << 16) + (t3/divisor);
}