I have a very big unsigned binary big integer in the scale of 6*10^120. Assume the big integer is stored in a struct of many QWORD (8 bytes) unsigned integers or several YMM registers.
I want to display it in decimal (not binary) scientific notation like 6E120
. The mantissa is always 1 digit and must be the leading digit of the full decimal representation; truncate it to 1 significant digit, not rounding to nearest. The exponent is always 3 digits. The format is aExyz like 8E095.
What is the most time-efficient (fastest) algorithm to find the order of magnitude (power of 10) and leading decimal digit? I am asking for the algorithm, not for the program, I will write it myself.
It will be done in MASM64 assembly language. If there are instructions that can help like bit manipulations or FPU/SSE/AVX512 tricks, please do suggest them.
This is not a high-level program so any responses that include third-party libraries or high-level language constructs are not helpful. I know a certain algorithm involves many divisions. Those are expensive in ASM so I'm looking for an alternative. I know how to convert from binary to decimal and then to scientific notation. I'm trying to avoid that middle step.
Assuming the maximum possible value is less than 1E154, and thus all values fit in 512 bits, than I suspect the answer might be:
powers_of_10
in a static const array. (#ops ~0)
clz
as well) (#ops < ~10)(max-bits - number_of_leading_zeroes) / 3.32192809489
to give a good estimate of the final number of decimal digits. This is also a good estimate a close power of 10. (#ops ~2)powers_of_10
from that estimate until you find the largest power-of-10 less than your value. (#ops ~8)
uint64
s)
double(input)/double(power_of_ten)
will do it in a single division.loop_count-1
E
power_of_ten_index
. (#ops ~4)If you're willing to sacrifice accuracy in both exponent and mantissa, that leaves ~16 operations that completely ignore the low bits.
Without writing out a final implementation, performance is hard to guess at, and with a large LUT, cache, and therefore the rest of the program becomes a factor, but here's preliminary data: https://quick-bench.com/q/53k-xSQz7y4iCO7ny66Dz2w62dQ (I ran it several times to try to eliminate outliers)
In my tests, the fastest combination seems to (unsurprisingly) be:
powers_of_ten
to confirm the exponent.From this baseline, we can see:
powers_of_ten
seems to have no noticeable impact on time on average, as long as you also use floats to guess the mantissa. If you do not use floats to guess the mantissa, then the mantissa calculation will take significantly longer. This implies it is not adding significant accuracy on average, and can be skipped to minimize code size.It is also interesting to note that all of these are ~4x faster than converting the bigint to a double
and then using printf("%.0E",value);
(I do not vouch for the accuracy of the results of any of this code)