c++coptimization

Why is modulo operator necessary?


I've read in a document that you can replace mod operation by logical and like this :

Instead:

int Limit = Value % Range;

You do:

int Limit = Value & (Range-1);

But compilers still generate mod instructions and my question is basically : Why do compilers don't use the most efficient approach if they work the same ?


Solution

  • you can replace modulo with that only if it is a power of 2. Using elementary math to replace it without a modulo

    a = b % c;
    

    can be done with

    x = b % c;
    a = b / (x*c);
    

    Lets check this with an example

    25 % 7 = 
    25 / 7 = 3 (integer math)
    25 - (3 * 7) =
    25 - 21 = 4
    

    Which is how I have to do it on my calculator anyway as I dont have a modulo operator.

    Note that

    25 & (7-6) = 
    0x19 & 0x6 = 0x0
    

    So your substitution does not work.

    Not only do most processors not have a modulo, many do not have a divide. Check out the hackers delight book.

    WHY would you want modulo? If you have burned the hardware to make a divide, you might be willing to go that extra mile to add modulo as well. Most processors take your question to the next level, why would you implement a divide in hardware when it can be done in software. The answer to your question is most processor families do not have a modulo, and many do not have a divide because it is not worth the chip real estate, power consumed, etc compared to the software solution. The software solution is less painful/costly/risky.

    Now I assume your question is not what the winning poster answered. For cases where the Range is a power of two and the identity does work... First off if range is not known at compile time then you have to do a subtract and an and, two operations, and maybe an intermediate variable, that is much more costly than a modulo, the compiler would be in error to optimize to a subtract and and instead of a modulo. If the range is a power of two and is known at compile time your better/fancier compilers will optimize. There are times, esp with a variable word length instruction set where the smaller instruction may be used over the larger instruction, it might be less painful to load Range and do a modulo than to load the larger number of non-zero bits (values of Range that match your identity have a single bit set in the value, the other bits are zero, 0x100, 0x40, 0x8000, etc) and do the modulo. the load immediate plus modulo might be cheaper than the load immediate plus and, or the modulo immediate might be cheaper than the and immediate. You have to examine the instruction set and how the compiler has implemented the solution.

    I suggest you post some examples of where it is not doing the optimization, and I assume we can post many examples of where the compiler has done the optimization you were expecting.