c++algorithmmultiplicationfixed-point

What's the best multiplication algorithm for fixed point where precision is necessary


I know, I know, people are probably going to say "just switch to floating point", but currently that is not an option due to the nature of the project that I am working on. I am helping write a programming language in C++ and I am currently having difficulty trying to get a very accurate algorithm for multiplication whereby I have a VM and mainly the operations for mod/smod, div/sdiv (ie signed numbers are not a concern here), mul, a halving number for fully fractional numbers and a pushed shift number that I multiply and divide by to create my shifting. For simplicity, lets say I'm working with a 32 byte space. My algorithms work fine for pretty much anything involving integers, it's just that when my fractional portion gets over 16 bytes that I run into problems with precision, and if I were to round it, the number would be fairly accurate, but I want it as accurate as possible, even willing to sacrifice a tad in performance for it, so long as it stays a fixed point and doesn't go into floating point land. The algorithms I'm concerned with I will map out in a sort of pseudocode. Would love any insight into how I could make this better, or any reasoning as to why by the laws of computational science, what I'm asking for is a fruitless endeavor.

For fully fractional numbers (all bytes are fractional):

 A = num1 / halfShift //truncates the number down to 16 so that when multiplied, we get a full 32 byte num
 B = num2 / halfShift
 finalNum = A * B

For the rest of my numbers that are larger than 16 bytes I use this algorithm:

 this algorithm can essentially be broken down into the int.frac form
 essentially A.B * C.D taking the mathematic form of
 D*B/shift + C*A*shift + D*A + C*B
 if the fractional numbers are larger than the integer, I halve them, then multiply them together in my D*B/shift
 just like in the fully fractional example above

Is there some kind of "magic" rounding method that I should be aware of? Please let me know.


Solution

  • You get the most accurate result if you do the multiplication first and scale afterwards. Of course that means, that you need to store the result of the multiplication in a 64-bit int type. If that is not an option, your approach with shifting in advance makes sense. But you certainly lose precision.

    Either way, you can increase accuracy a little if you round instead of truncate.

    I support Aconcagua's recommendation to round to nearest. For that you need to add the highest bit which is going to be truncated before you apply the division.

    In your case that would look like this:

    A = (num1 + 1<<(halfshift-1)) >> halfshift 
    B = (num2 + 1<<(halfshift-1)) >> halfshift
    finalNum = A * B
    

    EDIT:

    Example on how to dynamically scale the factors and the result depending on the values of the factors (this improves resolution and therefore the accuracy of the result):

    shiftA and shiftB need to be set such that A and B are 16 byte fractionals each and therefore the 32 byte result cannot overflow. If shiftA and shiftB is not known in advance, it can be determined by counting the leading zeros of num1 and num2.

    A = (num1 + 1<<(shiftA-1)) >> shiftA
    B = (num2 + 1<<(shiftB-1)) >> shiftB
    finalNum = (A * B) >> (fullshift - (shiftA + shiftB))