I need to perform a many operations of finding remainders of the division unsigned long long
number by the 16-bit modulus:
unsigned long long largeNumber;
long residues[100];
unsigned long modules[100];
intiModules(modules); //set different 16-bit values
for(int i = 0; i < 100; i++){
residues[i] = largeNumber % modules[i];
}
How I can accelerate this loop?
The iteration count is not large (32-128), but this loop performed very often so its speed is critical.
Division by a constant (and there are only 65536 of them) can be performed by multiplication of the reciprocal followed/preceded by some fine-tuning. Since this method is accurate for a limited range, one can use some techniques to reduce the 64-bit operand to a much smaller value (which is still congruent to the original value):
// pseudo code -- not c
a = 0x1234567890abcdefULL;
a = 0x1234 << 48 + 0x5678 << 32 + 0x90ab << 16 + 0xcdef;
a % N === ((0x1234 * (2^48 % N) + // === means 'is congruent'
(0x5678 * (2^32 % N)) + // ^ means exponentation
(0x90ab * (2^16 % N)) +
(0xcdef * 1)) % N;
The intermediate value can be calculated with (small) multiplications only and the final remainder (%N) has potential to be calculated with reciprocal multiplication.