c++cintegerc-preprocessorsuffix

To which degree does the C preprocessor regard integer literal suffixes?


Today, I stumbled over something like this:

#define FOO 2u

#if (FOO == 2)
  unsigned int foo = FOO;
#endif

Regardless of why the code is as it is (let's not question the why), I was wondering to which degree the preprocessor can even handle integer literal suffixes. I was actually surprised that it works at all. After doing some experiments with GCC and C99 with this code ...

#include <stdio.h>

int main()
{
  #if (1u == 1)
    printf("1u == 1\n");
  #endif

  #if (1u + 1l == 2ll)
    printf("1u + 1l == 2ll\n");
  #endif

  #if (1ull - 2u == -1)
    printf("1ull - 2u == -1\n");
  #endif

  #if (1u - 2u == 0xFFFFFFFFFFFFFFFF)
    printf("1u - 2u == 0xFFFFFFFFFFFFFFFF\n");
  #endif

  #if (-1 == 0xFFFFFFFFFFFFFFFF)
    printf("-1 == 0xFFFFFFFFFFFFFFFF\n");
  #endif

  #if (-1l == 0xFFFFFFFFFFFFFFFF)
    printf("-1l == 0xFFFFFFFFFFFFFFFF\n");
  #endif

  #if (-1ll == 0xFFFFFFFFFFFFFFFF)
    printf("-1ll == 0xFFFFFFFFFFFFFFFF\n");
  #endif
}

... which just prints all the statements:

1u == 1
1u + 1l == 2ll
1ull - 2u == -1
1u - 2u == 0xFFFFFFFFFFFFFFFF
-1 == 0xFFFFFFFFFFFFFFFF
-1l == 0xFFFFFFFFFFFFFFFF
-1ll == 0xFFFFFFFFFFFFFFFF

... I guess the preprocessor simply ignores integer literal suffixes altogether and probably always does arithmetics and comparisons in the native integer size, in this case 64 bit?

So, here is the stuff I'd like to know:

  1. To which degree does the preprocessor regard integer literal suffixes? Or does it just ignore them?
  2. Are there any dependencies or different behaviors with different environments, e.g. different compilers, C vs. C++, 32 bit vs. 64 bit machine, etc.? I.e., what does the preprocessor's behavior depend on?
  3. Where is all that specified/documented?

I wanted to find out by myself and checked out Wikipedia and the C standard (working paper). I found information about integer suffixes and information about the preprocessor, but none about the combination of these. Obviously, I have also googled it but didn't get any useful results.

I have seen this Stack Overflow question that clarifies where it should be specified, but yet, I couldn't find an answer for my questions.


Solution

    1. To which degree does the preprocessor regard integer literal suffixes? Or does it just ignore them?

    The type suffixes of integer constants are not inherently meaningful to the preprocessor, but they are an inherent part of the corresponding preprocessing tokens, not separate. The standard has this to say about them:

    A preprocessing number begins with a digit optionally preceded by a period (.) and may be followed by valid identifier characters and the character sequences e+, e-, E+, E-, p+, p-, P+, or P-.

    Preprocessing number tokens lexically include all floating and integer constant tokens.

    (C11 6.4.8/2-3; emphasis added)

    For the most part, the preprocessor doesn't treat preprocessing tokens of this type any differently than any other. The exception is in the controlling expressions of #if directives, which are evaluated by performing macro expansion, replacing identifiers with 0, and then converting each preprocessing token into a token before evaluating the result according to C rules. Converting to tokens accounts for the type suffixes, yielding bona fide integer constants.

    This does not necessarily produce results identical to those you would get from runtime evaluation of the same expressions, however, because

    For the purposes of this token conversion and evaluation, all signed integer types and all unsigned integer types act as if they have the same representation as, respectively, the types intmax_t and uintmax_t.

    (C2011, 6.10.1/4)

    You go on to ask

    1. Are there any dependencies or different behaviors with different environments, e.g. different compilers, C vs. C++, 32 bit vs. 64 bit machine, etc.? I.e., what does the preprocessor's behavior depend on?

    The only direct dependency is the implementation's definitions of intmax_t and uintmax_t. These are not directly tied to language choice or machine architecture, though there may be correlations with those.

    1. Where is all that specified/documented?

    In the respective languages' language specifications, of course. I've cited the two of the more relevant sections of the C11 specification, and linked you to a late draft of that standard. (The current C is C18, but it hasn't changed in any of these regards.)