cconstantslanguage-lawyerc-preprocessorimplementation-defined-behavior

Conditional inclusion: numeric value for the character constants: within #if/#elif vs. without #if/#elif: why matching is implementation-defined?


Case A: C11, 6.6 Constant expressions, Semantics, 5:

If a floating expression is evaluated in the translation environment, the arithmetic range and precision shall be at least as great as if the expression were being evaluated in the execution environment.116)

which requires the following program to return 0:

#include <float.h>

#define EXPR DBL_MIN * DBL_MAX

double d1 = EXPR;
double d2;

#pragma STDC FENV_ACCESS ON

int main(void)
{
    d2 = EXPR;
    return d1 == d2 ? 0 : 1;
}

Case B: C11, 6.10.1 Conditional inclusion, Semantics, 4:

Whether the numeric value for these character constants matches the value obtained when an identical character constant occurs in an expression (other than within a #if or #elif directive) is implementation-defined.168)

which does not require the following program to return 0:

#define EXPR 'z' - 'a' == 25

int main(void)
{
    _Bool b1 = 0;
    _Bool b2;
#if EXPR
    b1 = 1;
#endif
    b2 = EXPR;
    return b1 == b2 ? 0 : 1;
}

Question: what is the rationale for making "Case B" implementation-defined behavior?


Solution

  • The C11 Standard (I shall be quoting from this draft document) defines two character sets:

    5.2.1 Character sets

    1     Two sets of characters and their associated collating sequences shall be defined: the set in which source files are written (the source character set), and the set interpreted in the execution environment (the execution character set). Each set is further divided into a basic character set, whose contents are given by this subclause, and a set of zero or more locale-specific members (which are not members of the basic character set) called extended characters. The combined set is also called the extended character set. The values of the members of the execution character set are implementation-defined.

    Furthermore, there is no requirement that equivalent characters in those sets are represented by the same values, and neither is there any requirement that Latin letters are stored in sequence. So, in the example given, the value of 'z' - 'a' need not be the same in those two sets.

    Now, the order of translation phases specifies that macro invocations and evaluations (and other pre-processing directives) are performed using the source character set but expressions that occur in executable code are evaluated after conversion to the execution character set:

    5.1.1.2 Translation phases


    4.     Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. If a character sequence that matches the syntax of a universal character name is produced by token concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted.
    5.     Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character.

    Thus, because the relationship between those character sets is implementation-defined, and because the two occurrences of the character-based constant expressions are defined to use different sets, the fact that they may have different evaluations must also be implementation-defined.