c++undefined-behaviormemcmp

C++: Is there a difference between calling an operator and calling it's implementation


I have a class in which I overloaded the == operator with memcmp() on a specific member. Due to a bad copy done in the code (memcpy called with bigger size than it should) I had a segfault when invoking the == operator.

I understand that UB is mysterious and obviously undefined, but still there is something I noticed that intrigues me.

While debugging, I swapped the == call with it's implementation (i.e. the a==b was swapped with memcmp(a.member_x, b.member_x, SIZE)) and no segfault!

So, is there a difference between using the operator itself and replacing it with the implementation or is this just the UB?

To clarify: Yes, this code includes UB. It's bad and it's results are undefined. What I want to know is: does something different happens when calling an operator or calling it's body? The UB just made me think that a difference might exist (and obviously was fixed)


Solution

  • Undefined Behavior means that "anything can happen". "Anything" includes "working just as intended". It can mean that you can get different behavior without changing anything, and it can mean that you get the same behavior even though you changed something.

    In the past, warnings about relying on undefined behavior have often included the proverbial "launching of the nuclear missiles".

    However, with modern aggressively optimizing compilers, the behavior can be much more subtle. In the past, undefined behavior would usually lead to "whatever happens happens". E.g. in your example, you would either read "junk" in memory if you are allowed to access it, or segfault if you aren't. But the operation (i.e. "compare these two chunks of memory") would still happen somehow.

    This is no longer "guaranteed" (not that there ever were any guarantees when it comes to UB) with modern aggressively optimizing compilers. The compiler will no longer just do nonsense things.

    With modern optimizing compilers, the compiler must often decide (or prove) that a certain optimization is safe, i.e. that it doesn't alter observable specified behavior. And since UB means "anything can happen" it means that the part of the optimizer that proves that certain optimizations are safe can "assume anything it wants". In essence, it can assume that all optimizations are safe, and then proceed however it wants to provide the most aggressive optimization possible.

    As a result, UB is much less predictable and much less obvious than it once was. For example, UB in one place of the program can lead to the optimizer optimizing something in a way that it changes the behavior of something else in a different part of the program that is connected to this piece of code somehow (e.g. it calls it, or both manipulate the same state).

    Let's say we have two threads manipulating shared mutable state. One of the two threads exhibits UB. Then, the optimizer can decide that that thread doesn't manipulate the state ("anything can happen", remember?) and since it can now prove that the state will only ever be accessed by one thread, it can optimize away all locks! [Note: I have no idea whether any compiler in reality does this, but it would be allowed to!]

    Here's another example to demonstrate that "anything can happen" really, truly does mean "anything": let's assume there are two possible optimizations that could be applied in some code higher up the stack which calls your operator==. One optimization is only valid if the compiler can prove that operator== will always be true. The other optimization is only valid if the compiler can prove that it will always be false. This means, of course, that neither optimization can be applied since in general, your operator== could return either true or false.

    But! We have UB. So, the compiler can decide to just assume that it will always be true and apply optimization #1. Or it could decide that it will always be false and apply optimization #2. Okay, fair enough. However, it can also decide to apply both optimizations! Remember: "anything can happen". Not just "anything that makes sense according to the logical framework of the C++ spec" but "anything" period. If the compiler needs something to be true and false at the same time, it is free to assume so in the presence of UB.

    You can think of a modern optimizing compiler as trying to prove theorems about your code, and then applying optimizations based on those proofs. And UB allows it to prove any and all theorems.