How is std::atomic implemented on platforms that do not use two's complement? (C++11/14/17)

std::atomic::fetch_add() (or atomic_fetch_add() in C11) is based on two's complement for operations on signed integers. This is a different convention from the usual signed integer arithmetic, in which other representation schemes are allowed.

From N3337 [atomics.types.operations/30]:

Remark: For signed integer types, arithmetic is defined to use two's complement representation. There are no undefined results. For address types, the result may be an undefined address, but the operations otherwise have no undefined behavior.

My questions are:

(1) How is std::atomic::fetch_add() implemented on a platform that uses a representation of signed integers other than two's complement? Is there only a non-lock-free implementation using mutex, etc.?

(2) When reading a value from an atomic variable that contains a negative value on a platform that uses a representation method other than two's complement, is it automatically converted to the platform's native format? Or is it the user's responsibility to convert it to the native representation, since the bit string of two's complement representation is read from it?

std::atomic<int> a{0};
a.fetch_sub(1);
int x = a.load();  // The value of a is -1 (0xffffffff), so what is the value of x?

(3) I would like to know examples of std::atomic::fetch_add() (or atomic_fetch_add()) implementations on platforms that adopt a representation scheme other than two's complement. (I would not be surprised if such a platform does not exist for real!)

Note that N2427 states the following concerning the above:

The normal signed integral addition operations have undefined behavior in the event of an overflow. For atomic variables, this undefined behavior introduces significant burden because there is in general no way to pre-test for overflow. Rather than leave overflow undefined, we recognize the defacto behavior of modern systems and define atomic fetch-and-add (-subtract) to use two's-complement arithmetic. We are aware of no implementation of C++ for which this definition is a problem.

(This question is based on academic interest. It's not one that I have asked because I am having trouble facing any real problems.)

Solution

This is all hypothetical; I don't think there are any C++11 through 17 implementations for one's complement or sign/magnitude machines. (Or any C11 implementations that implement _Atomic and <stdatomic.h>; that part of C11 is optional, unlike C++'s <atomic> header.)
C++20 (and C23) standardized two's complement for plain integer types as well, which I assume is why your title specifies 11/14/17.

std::atomic<> was not really intended to be implemented on machines that don't use 2's complement. My understanding is that std::atomic<> is not optional, although only atomic_flag is required to be lock-free. But C11 used the same definitions and it was optional there. Still, this seemed like testing the waters for standardizing on 2's complement in general. As well as because nobody(?) was interested in writing atomic rules for hypothetical machines that don't exist and are unlikely to be built.

fetch_add on a two's complement int is the same binary operation as on unsigned (assuming they have the same width). One of the major benefits of two's complement, is not needing separate ALU configs or separate instructions to do signed vs. unsigned.

So a compiler could just use the same asm as for atomic<unsigned> (which might or might not be a CAS retry loop), assuming unsigned has at least as many value-bits as plain int. (It's ok if atomic<int> has more value bits thanint, or if the compiler generates code to truncate them at some point, like during conversion to int.)

C++ requires support for unsigned integer types with well-defined wrapping, and my understanding is that most one's complement and sign/magnitude machine had both signed-add and unsigned-add instructions, at least for ISAs that could and did efficiently implement K&R or C89. If atomic-RMW add wasn't available directly with a single instruction, then you'd have to fall back to a CAS retry loop for both atomic<unsigned> and atomic<int>.

If the machine's primitive operation is LL/SC, then you just use that instead of CAS. If the machine's primitive atomic building block is CAS with comparison for bitwise equality like memcmp, that's also fine.
But if it only has a signed CAS that traps on negative 0 (0x80000000 in sign/magnitude, 0xffffffff in one's complement) or treats it as equal to +0, then you'd have a problem and couldn't implement lock-free std::atomic for signed types.

Extra code would be needed to convert the arg and retval between the machine's native int and a 2's complement bit-pattern to use as unsigned, but not as part of the atomic operation itself. (Not inside the retry loop).

For example, 1's complement to 2's complement is just adding 1 to the bit-pattern of negative numbers. (-1 is represented by the bit-patterns 0xfffffffe and 0xffffffff respectively, and in 1's complement -x = ~x). So using unsigned right-shift and add, x += x>>31 for 32-bit machines. And to convert the other direction, 2's to 1's, x -= x>>31, which will wrap to INT_MAX for the most-negative 2's complement value, -2^n.

If the machine has wrapping int <-> unsigned conversion instructions, those should do the trick at least for conversion to unsigned. C requires int to unsigned to work as-if by modulo-reducing the value into the value-range of the destination, which for a negative input means adding 2³² to -1 to make 4294967295 = 0xffffffff. (On 2's complement machines it's a no-op, just a reinterpret of the bits.) I'm less sure about unsigned -> one's complement or sign/magnitude being the same as 2's complement to those formats, I haven't tried an example.

C++ doesn't let you read carry or signed-overflow flags as an extra output, and there aren't any atomic-RMW operations what include a compare except for equality. In hand-written asm you might want to branch on signed-overflow (or especially after a sub, on a signed condition like greater or less-than that includes both SF and OF), but in C++ you'd have to have written the compare separately, using plain int values (potentially holding values converted from 2's complement atomic types).

BTW, a separate load after an RMW is an anti-pattern. Operations from other threads could happen between the RMW and the reload, plus it's just less efficient. .fetch_sub returns the old value, so normally you'd so a.fetch_sub(1) - 1 to get the value you stored. But then you'd be doing the -1 on plain int. you could use x = --a, but that's defined as equivalent to .fetch_sub(1) - 1 IIRC. You could initialize the atomic to -1, but doing an operation that gets it there from 0 is worth talking about. So now I see why you wrote your example that way!

    std::atomic<int> a{0};
    int result = --a;  

    int x = a.load();  // maybe in another thread

As you say, a = -1 with bit-pattern 0xffffffff for the object-representation of a. (Or at least the part of a that actually holds the int value; the std::atomic<> object is allowed to be larger and have other value bits for whatever private members it might have.)

An int return value from a std::atomic member function is a plain int, so conversion from 2's complement int to the machine's native int has to happen as part of producing the return value. (Then a copy of it is assigned to int x.)

C++17, section 7.8 Integral conversions

(2) If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2ⁿ where n is the number of bits used to represent the unsigned type).
[ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]

(3) If the destination type is signed, the value is unchanged if it can be represented in the destination type; otherwise, the value is implementation-defined.

(From other numeric types such as FP, out-of-range conversion is UB. The above is for integral-to-integral.)

The 2's complement most-negative value is one lower than INT_MIN. So even for equal width, the value might not be representable.

I think even unsigned tmp = a.load(); would be unable to produce all 2³² possible values that a could be holding (assuming a 32-bit machine again), since the value has to pass through the int return value. If it only has 2^32 - 1 distinct values, something has to collapse somewhere. Perhaps from -2^31 wrapping to +2^31 - 1 (INT_MAX). (Negative zero isn't distinct from normal 0 even if it's not a trap representation. It has to convert to unsigned 0. memcpy could extract the bit-pattern of an int...)

I think it would make sense to consider the 2's complement representation used internally by std::atomic<int> as an integral type, although it would be worth checking the standard's wording to see whether you're allowed to be evil if implementing a C++17 Deathstation 9000 that uses non-2's-complement. (Or conversely, whether we could still be conforming by copying the bit-pattern straight to unsigned for unsigned x = a.load(), skipping conversion to/from the native signed int.)

BTW, the language in the current draft is much simpler since only 2's complement types and unsigned exist:

(3) Otherwise [not bool], the result is the unique value of the destination type that is congruent to the source integer modulo 2^N, where N is the width of the destination type.

So modulo-reduction even for conversion to signed, not just unsigned. Because that's what you get in 2's complement from truncating a bit-pattern.