Some commentary about the accepted answer is at the bottom of this question post.
According to the C standard (C17 draft, 6.10.3.2 ¶2):
The order of evaluation of [the]
#
and##
operators is unspecified.
I am looking for an example where this evaluation order matters and where there are no other instances of undefined behavior and no errors.
After spending some time on this matter, I suspect that the following might work:
#define PRECEDENCETEST(a, b, c) # a ## b
PRECEDENCETEST(c, , d)
(Note that the preprocessor can be run as follows: cpp
or gcc -E
(GCC), cl /E
(MSVC); see further below for a compilable dummy example. Note also that empty macro arguments are only legal since C99.)
My question: Does this actually work as an example where either relative evaluation order of #
and ##
produces legal output, according to the C standard? As I explain at the bottom of this post, the answer might, if I understand correctly, rely on whether the standard allows for the token after #
to end up being different from the one originally specified.
If the answer is "yes (because ...)", then we've found an example! If the answer is "no, your example doesn't work (because ...)", then I'll later think of a way to solicit better examples.
(Note that the standard imposes no requirement that a compiler have an absolute relative evaluation order for the #
and ##
operators. The order could be: left-to-right, right-to-left, following some other logic, or entirely random.)
Older GCC documentation (up to version 6.5 it seems) states:
The standard does not specify the order of evaluation of a chain of ‘
##
’ operators, nor whether ‘#
’ is evaluated before, after, or at the same time as ‘##
’. You should therefore not write any code which depends on any specific ordering. It is possible to guarantee an ordering, if you need one, by suitable use of nested macros.An example of where this might matter is pasting the arguments ‘
1
’, ‘e
’ and ‘-2
’. This would be fine for left-to-right pasting, but right-to-left pasting would produce an invalid token ‘e-2
’.GCC 3.0 evaluates ‘
#
’ and ‘##
’ at the same time and strictly left to right. Older versions evaluated all ‘#
’ operators first, then all ‘##
’ operators, in an unreliable order.
(As for the ##
-only example in the middle paragraph (ie: 1##e##-2
): 1e
is not a valid floating-constant (C17 draft, 6.4.4.2) but it's a valid pp-number ("preprocessing number"; C17 draft, 6.4.8) because a sole e
is a valid identifier-nondigit. (Preprocessing numbers exist "to isolate the preprocessor from the full complexity of numeric constants"; see the GNU documentation for its C preprocessor.) That said, a better example would have been 2##.##e3
(valid for left-to-right but not right-to-left token concatenation), adapted from this MISRA discussion.)
For what it's worth, Wikipedia claims the following in its article on the C preprocessor:
[F]unction-like macro expansion occurs in the following stages:
- Stringification operations are replaced with the textual representation of their argument's replacement list (without performing expansion).
- Parameters are replaced with their replacement list (without performing expansion).
- Concatenation operations are replaced with the concatenated result of the two operands (without expanding the resulting token).
- Tokens originating from parameters are expanded.
- The resulting tokens are expanded as normal.
However, I can't find support for this specific order of evaluation in either the C standard or GNU's documentation for CPP (the C preprocessor, part of GCC), whose latest documentation as of the time of asking this question (GCC 13.2) is here.
Most importantly, none of the above-mentioned sources (incl the C17 standard) provide examples of a function-like macro which would evaluate to something different depending on the relative precedence of #
and ##
in the replacement-list of the macro.
I'm looking for examples that don't lead to otherwise undefined behavior or an error, because macros that are seemingly valid are a potential source of hard-to-find bugs. Important in this regard are the following two constraints:
#
operator] is not a valid character string literal, the behavior is undefined." (C17 draft, 6.10.3.2 ¶2)##
] is not a valid preprocessing token, the behavior is undefined." (C17 draft, 6.10.3.3 ¶3)The search for a suitable example turns out to be surprisingly tricky.
For one thing, string literals (C17 draft, 6.4.5) – which we are considering because they are the result of applying #
– can barely be concatenated with anything else using ##
:
##
cannot be used to concatenate two string literals, because something like "abc""def"
wouldn't be a valid preprocessing-token (C17 draft, 6.4 ¶1). Important here is to note that ##
-based token concatenation is not like the concatenation of string literals from translation phase 6 (C17 draft, 5.1.1.2 ¶1), which would merge "abc"
and "def"
into "abcdef"
.u8
, u
, U
, L
), but writing a replacement-list like [...] ## # b
that leads to valid preprocessing tokens requires a delicate balance of #
s (which, aside from starting a preprocessing directive or from being within a string or character literal, can only exist as part of the preprocessing tokens #
and ##
themselves), which I wasn't able to achieve. For example,
#define TEST(a, b) a ## # b
TEST(, c)
produces "c"
under either evaluation order (assuming that #
as the stringify operator can legally result from the application of ##
), and I am not sure whether this example can be morphed into one producing two different valid results depending on the evaluation order.Also, something like a ## b # c
doesn't work, because in this expression, the "a ## b
" and "# c
" parts are independent.
However, it seems like the following might work:
#include <stdio.h>
#define PRECEDENCETEST(a, b, c) # a ## b
int main(void) {
printf("%s\n", PRECEDENCETEST(c, , d));
return 0;
}
Case A: With both GCC and MSVC, I get the output c
, corresponding to a #
-before-##
evaluation order:
PRECEDENCETEST(c, , d)
# a ## b
"c" ## b
"c" ## <placemarker>
"c"
(A placemarker preprocessing token signifies an empty macro argument adjacent to ##
. (C17 draft, 6.10.3.3 ¶2))
Case B: A ##
-before-#
evaluation order would give us the following:
PRECEDENCETEST(c, , d)
# a ## b
# c ## <placemarker>
# c
"d"
That is, the program's output would have to be d
. Or would it? The last step here assumes that #
can operate not only on parameters from the original replacement-list but also on those resulting from the application of ##
. Note importantly that the following constraint (C17 draft, 6.10.3.2 ¶1)
Each # preprocessing token in the replacement list for a function-like macro shall be followed by a parameter as the next preprocessing token in the replacement list. [This doesn't apply to object-like macros.]
is not violated – it's just that in this example the actual parameter of #
ends up being a different parameter (c
) from the one specified in the replacement-list (a
).
Commentary about the accepted answer:
I believe that the accepted answer represents the most sensible interpretation of the standard. In fact, the standard should have been written in a way to force any reader to the same conclusions.
However, I do believe that the standard's authors didn't think it through. The reason is this: The combination of
##
-concatenation of two string literalsis relatively close to a proof that
there are no cases where the same input, parsed in two different ways which differ only in the order in which
#
and##
are applied, leads to two different output possibilities which neither invoke errors (such as violations of preprocessor constraints) nor undefined behavior.
For, if there are indeed no such cases, the writers of the C standard could have simply prescribed "#
before ##
", as adding such a prescription wouldn't be able to affect existing valid/non-UB programs. (See my discussion with the answerer for additional details/points.)
Similarly, if the C standard was as clear as the accepted answer suggests, why did the GCC maintainers and documentation authors (who evidently gave the matter some thought) not provide relevant commentary with a similar conclusion (or otherwise a contrasting example)?
The order of evaluation of [the]
#
and##
operators is unspecified.I am looking for an example where this evaluation order matters and where there are no other instances of undefined behavior and no errors.
I take you to mean that you are looking for a case where two different evaluation orders yield valid (no errors) but semantically different (order matters) results.
I suspect that the following might work:
#define PRECEDENCETEST(a, b, c) # a ## b PRECEDENCETEST(c, , d)
[...] Does this actually work as an example where either relative evaluation order of # and ## produces legal output, according to the C standard?
No.
You need to pay careful attention to the specifications for macro parameter handling and the behavior of the #
and ##
operators. When expanding a function-like macro, there are three possible cases for each appearance of a parameter name in the macro's replacement list:
the parameter [name] is neither preceded by a #
or ##
preprocessing token nor followed by a ##
preprocessing token (C17 6.10.3.1/1). In this case, the corresponding argument's preprocessing token sequence is fully macro-expanded, then the parameter is replaced by the result.
the parameter [name] is immediately preceded by a #
preprocessing token (C17 6.10.3.2/2). In this case, the corresponding argument's preprocessing token sequence is stringified, then the #
and the parameter are replaced by the result.
the parameter [name] is immediately preceded or followed by a ##
preprocessing token (C17 6.10.3.3/2). In this case, the parameter is first replaced by the corresponding argument's preprocessing token sequence or by a placemarker token, as appropriate. Then, before rescan but (implicitly) after the parameter replacements required by this paragraph, the appropriate token pasting is applied to the preprocessing tokens bracketing each ##
from the replacement list.
Where the spec says "parameter" in these clauses, it is talking about a single preprocessing token from the replacement list, containing the parameter name, not about the preprocessing tokens of the corresponding argument. Therefore, only one of those cases can be applied to any given appearance of a parameter. Once that appearance has been replaced with something else according to one of those rules, there is no longer a parameter there to be replaced according to one of the other rules.
Your example involves a parameter that is preceded by a #
and followed by a ##
, so that its replacement could be performed according to either case (2) or case (3). We could argue that the spec does not define what happens in cases where both of those are applicable (thus undefined behavior), but suppose we don't go there, and instead look at evaluation order. Then,
Applying stringification first works. The #
and a
are replaced with string literal token "c"
. The operands of ##
don't have to be parameters, so it's ok that there's no parameter remaining as the left-hand operand of ##
. The b
is replaced (then or earlier) with a placemarker token, and the concatenation is performed sometime before rescan, yielding "c"
.
Applying first the token replacement called for by the ##
operator does not work. Having performed that replacement, there is no longer a parameter to be replaced during evaluation of the #
operation. The result of choosing this order of evaluation is therefore undefined behavior.
Side note: there is no possible way in which your example could be expected to yield "d"
, even if we were much looser about distinguishing between parameters and their corresponding argument sequences. Parameter c
does not appear in the replacement list of your macro, so its corresponding argument makes no contribution to the resulting expansion. And even if we tacked on a c
at the end of the replacement list, there's no reason to think that the scope of the stringification operator would extend that far, no matter what the macro's second argument is.
Overall, if ever there is a meaningful evaluation-order choice between #
and ##
, that choice is between conditionally undefined behavior from stringifying first and unconditionally undefined behavior from pasting first (or at least performing the parameter-replacement part of that first). Since even the stringification-first case works only when the other operand of the ##
operator is a parameter whose corresponding argument is an empty sequence, there seems very little point to forming such constructs.