c++clanguage-lawyerimplicit-conversionrelational-operators

Comparing unsigned integer with negative literals


I have this simple C program.

#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>

bool foo (unsigned int a) {
    return (a > -2L);
}

bool bar (unsigned long a) {
    return (a > -2L);
}

int main() {
    printf("foo returned = %d\n", foo(99));
    printf("bar returned = %d\n", bar(99));
    return 0;
}

Output when I run this -

foo returned = 1
bar returned = 0

Recreated in godbolt here

My question is why does foo(99) return true but bar(99) return false.

To me it makes sense that bar would return false. For simplicity lets say longs are 8 bits, then (using twos complement for signed value):

99 == 0110 0011
-2 == unsigned 254 == 1111 1110

So clearly the CMP instruction will see that 1111 1110 is bigger and return false.

But I dont understand what is going on behind the scenes in the foo function. The assembly for foo seems to hardcode to always return mov eax,0x1. I would have expected foo to do something similar to bar. What is going on here?


Solution

  • This is covered in C classes and is specified in the documentation. Here is how you use documents to figure this out.

    In the 2018 C standard, you can look up > or “relational expressions” in the index to see they are discussed on pages 68-69. On page 68, you will find clause 6.5.8, which covers relational operators, including >. Reading it, paragraph 3 says:

    If both of the operands have arithmetic type, the usual arithmetic conversions are performed.

    “Usual arithmetic conversions” is listed in the index as defined on page 39. Page 39 has clause 6.3.1.8, “Usual arithmetic conversions.” This clause explains that operands of arithmetic types are converted to a common type, and it gives rules determining the common type. For two integer types of different signedness, such as the unsigned long and the long int in bar (a and -2L), it says that, if the unsigned type has rank greater than or equal to the rank of the other type, the signed type is converted to the unsigned type.

    “Rank” is not in the index, but you can search the document to find it is discussed in clause 6.3.1.1, where it tells you the rank of long int is greater than the rank of int, and the any unsigned type has the same rank as the corresponding type.

    Now you can consider a > -2L in bar, where a is unsigned long. Here we have an unsigned long compared with a long. They have the same rank, so -2L is converted to unsigned long. Conversion of a signed integer to unsigned is discussed in clause 6.3.1.3. It says the value is converted by wrapping it modulo ULONG_MAX+1, so converting the signed long −2 produces a ULONG_MAX+1−2 = ULONG_MAX−1, which is a large integer. Then comparing a, which has the value 99, to a large integer with > yields false, so zero is returned.

    For foo, we continue with the rules for the usual arithmetic conversions. When the unsigned type does not have rank greater than or equal to the rank of the signed type, but the signed type can represent all the values of the type of the operand with unsigned type, the operand with the unsigned type is converted to the operand of the signed type. In foo, a is unsigned int and -2L is long int. Presumably in your C implementation, long int is 64 bits, so it can represent all the values of a 32-bit unsigned int. So this rule applies, and a is converted to long int. This does not change the value. So the original value of a, 99, is compared to −2 with >, and this yields true, so one is returned.