[SOLVED] Signed integer overflow, intrinsics, and undefined behaviour

Signed integer overflow, intrinsics, and undefined behaviour

Is the very simple code below susceptible to undefined behaviour as the integer overflows as a result of the operation?

static volatile LONG x = LONG_MAX;

InterlockedIncrement(&x);

According to the standard, signed integer overflow is undefined behaviour. However, here we are out of the standard, as we are calling compiler's intrinsic function, which inlines to some assembly. Also, the value of x is not used anywhere (the function is just used as a memory barrier).

An answer to a similar question suggests this is not UB.

Solution

I claim there's no UB here, neither per the language standard (the standard doesn't cover this function/intrinsic) nor per the implementation, and there's a simple rollover.

Here's my reasoning...

InterlockedIncrement() is conceptually very simple and if it had a special case, it would be very hard to miss it and fail to document it. And the documentation hasn't mentioned any special case here for some 15+ years.

How would you implement it anyway?

If you're on the 80486 or better, the most natural implementation uses the XADD instruction with the LOCK prefix that atomically adds a value to a memory variable. The instruction by itself does not generate any overflow exception, however it does modify the EFLAGS register as does the regular addition instruction, ADD, so it's possible to detect an overflow and act on it. Specifically, you could throw in the INTO instruction to turn the overflow condition into an exception. Or you could use the conditional jump instruction, JO, to jump the overflow handler.

If you only have 80386 features, you have lock inc and lock add which can implement InterlockedIncrement only if the return value isn't used except to test it for zero/non-zero or negative/non-negative, or other condition you can derive from EFLAGS such as that signed-overflow happened (EFLAGS.OF). (In the general case where code does int tmp = InterlockedIncrement(&shared);, you'd need either lock xadd or a lock cmpxchg retry loop but those are both 486 or Pentium features. So you'd need a separate lock variable which all writers acquire and release, perhaps using the XCHG instruction (the LOCK is implicit with it) to take the lock. But this isn't compatible with the Interlocked... API which just exposes lock-free hardware operations, unlike std::atomic<int> which could have lock_free() be false when compiling for 386 and use a separate spinlock like it does for wide types. (In shared memory between processes, the spinlock would also need to be in shared memory so both processes could respect the same spinlock, so a library API can't just invent one for you, and it would force even operations like exchange to take/release the lock.)

Now, would you want to throw INTO or JO into all instances of InterlockedIncrement()? Probably not, definitely not by default. People like their atomic operations small and fast. And in some algorithms, there'd be no way to prevent the possibility of overflow with huge numbers of threads.

This is the "immediate" UB. What about the "creeping" UB? If you had C code like this:

  int a = INT_MAX;
  if (a + 1 < a)
    puts("Overflow!");

You'd likely get nothing printed nowadays. Modern compilers know that a + 1 can't legally(!) overflow and so the condition in the if statement can be taken as false irrespective of the value of a.

Can you have a similar optimization with InterlockedIncrement()?

Well, given that the variable is volatile and can indeed change in a different thread at any moment, the compiler may not assume unchanged a from two memory reads of it (you'd likely write a + 1 < a or similar as multiple statements and each a would need to be fetched if it's volatile).

It would also be an odd context to try to make the optimization in.