cgccclanginteger-promotion

GCC/Clang-18 compiler assumes type as unsigned int for enum


Hello and thanks for reading,

I just came upon a bug in my C program and while I was able to find out the cause of the bug I am having trouble rationalizing the behaviour of the compiler and I could use some help here. I was working on shifting a range of values into negative values, but was faced with a sign conversion causing it to result in an integer overflow. All in all not too exciting, but this happened because the compiler decided that an enum variable should be unsigned, which was used in a calculation, and then caused the output of that result to be coerced to an unsigned int as well.

Minimum reproducible code:

#include <stdio.h>

enum UnexpectedlyUnsignedEnum {
    VALUE1 = 5
};

int main(int argc, char *argv[]) {
    enum UnexpectedlyUnsignedEnum enum_value = VALUE1;
    int value1 = (1 - enum_value) / 2;
    int value2 = (1 - VALUE1) / 2;

    printf("%i\n", value1);
    printf("%i\n", value2);
}

Which results in the following output:

❯ ./example
2147483646
-2
❯

And I can somewhat understand the compiler trying to "optimize" the type to unsigned int as its values are all positive (I am not sure I agree that it is a good choice to make here, as it seems likely to introduce unnecessary bugs like this without a benefit I can see when the additional range of positive integers is unneeded), but I would still expect value1 and value2 to be equal to one another. That they are not implies to me that it is the storing of the value that converts the variable enum_value to unsigned, whereas the VALUE1 is still signed.

In addition, when compiling with GCC I get the sign conversion warning when enabled, but clang does not provide the warning despite having the flag and producing the same output as above:

GCC

❯ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
...
❯ gcc -Wconversion -o example example.c
example.c: In function ‘main’:
example.c:9:18: warning: conversion to ‘int’ from ‘unsigned int’ may change the sign of the result [-Wsign-conversion]
    9 |     int value1 = (1 - enum_value) / 2;
      |  
❯

Clang

❯ clang --version
Ubuntu clang version 18.1.8 (++20240731024944+3b5b5c1ec4a3-1~exp1~20240731145000.144)
...
❯ clang -Wsign-conversion -o example example.c
❯

In short, can someone help me understand whether something is broken here, or what I am failing to comprehend about these seeming discrepancies?

EDIT: Fixing the problem is not my issue, it has already been resolved. I wish to understand why the compiler does it this way as it seems counter-intuitive to me.


Solution

  • Since you are probably not using C 2024 yet, C 2018 says, in 6.4.4.3 2:

    An identifier declared as an enumeration constant has type int.

    and again, in 6.7.2.2 3:

    The identifiers in an enumerator list are declared as constants that have type int

    whereas for the enumeration itself, 6.7.2.2 4 says:

    Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined, but shall be capable of representing the values of all the members of the enumeration.

    Thus, VALUE1 being int while enum_value being effectively unsigned int conforms to C 2018, provided the implementation has documented the latter. GCC 11.4 does, in clause 4.9 of its manual:

    Normally, the type is unsigned int if there are no negative values in the enumeration, otherwise int

    In C 2024, you may specify the type to be used for an enumeration, as with:

    enum foo : int { VALUE1 = 5 };
    

    In this case, the members of the enumeration have the specified type, per C 2024 6.4.5.4:

    An identifier declared as an enumeration constant for an enumeration with a fixed underlying type has the associated enumerated type.

    (The underlying type is the specified type with qualifiers, including _Atomic, removed.)

    If no type is specified, C 2024 still specifies the type used for the enumeration is implementation-defined, but it adds rules for the types of the members. After the enumeration declaration is complete, the type of the members is int if int can represent them all and is the enumerated type otherwise. However, during processing of the declaration, such as when one member is defined using an expression containing prior members, the rules essentially say the types start as int and grow as needed, with some constraints. These temporary types govern expression evaluation inside the declaration but are discarded when the declaration is complete.