cc-preprocessorvariadic-macrostoken-pasting-operator

How to use the token pasting operator with a variable number of arguments?


I thought of having a generic version of #define concatenate(a, b, c) a ## b ## c

I tried it like this:

#include <stdio.h>

#define concatenate(arg1, ...) arg1 ## __VA_ARGS__

int main()
{
    int dob = 121201;
    printf("%d", concatenate(d, o, b));

    return 0;
}

I also tried many other ways:

#define concatenate(arg1, ...) arg1 ## ##__VA_ARGS__

#define concatenate(...) ## ##__VA_ARGS__

#define concatenate(...) ##__VA_ARGS__

#define concatenate(arg1, ...) arg1 ## ...

#define concatenate(arg1, ...) arg1 ## concatenate(##__VA_ARGS__)

Alas, all my attempts failed. I was wondering if it is even possible to do this in any way?


Solution

  • It's possible. Jens Gustedt's interesting P99 macro library includes the macro P99_PASTE, which has precisely the signature of your concatenate, as well as the same semantics.

    The mechanics which P99 utilizes to implement that function are complex, to say the least. In particular, they rely on several hundred numbered macros which compensate for the fact that the C preprocessor does not allow recursive macro expansion.

    Another useful explanation of how to do iteration in the C preprocessor is found in the documentation for the Boost Preprocessor Library, particularly the topic on reentrancy.

    Jens' documentation for P99_PASTE emphasizes the fact that the macro pastes left-to-right to avoid the ambiguity of ##. That might need a bit of explanation.

    The token-paste (##) operator is a binary operator; if you want to paste more than two segments into a single token, you need to do it a pair at a time, which means that all intermediate results must be valid tokens. That can require a certain amount of caution. Consider, for example, this macro which attempts to add an exponent to the end of an integer:

    #define EXPONENT(INT, EXP) INT ## E ## EXP
    

    (This will only work if both macro arguments are literal integers. In order to allow the macro arguments to be macros, we would need to introduce another level of indirection in the macro expansion. But that's not the point here.)

    What we will almost immediately discover is that EXPONENT(42,-3) doesn't work, because -3 is not a single token. It's two tokens, - and 3, and the paste operator will only paste the -. That will result in a two-token sequence 42E- 3, which will eventually lead to a compiler error.

    42E and 42E- are valid tokens, by the way. They are ppnumbers, preprocessing numbers, which are any combination of dots, digits, letters and exponents, provided that the token starts with a digit or a dot followed by a digit. (Exponents are one of the letters E or P, possibly lower-case and possibly followed by a sign. Otherwise, sign characters cannot appear in a ppnumber.)

    So we could try to fix this by asking the user to separate the sign from the number:

    #define EXPONENT(INT, SIGN, EXP) INT ## E ## SIGN ## EXP
    
    EXPONENT(42,-,3)
    

    That will work if the ## operators are evaluated from left-to-right. But the C standard does not impose any particular evaluation order of multiple ## operators. If we're using a preprocessor which works from right to left, then the first thing it will try to do is to paste - and 3, which won't work because -3 is not a single token, just as with the simpler definition.

    Now, I can't offer an example of a compiler which will fail on this macro, since I don't have a right-to-left preprocessor handy. Both gcc and clang evaluate ## left-to-right, and I think that's far and away the most common evaluation order. But you can't rely on that; in order to write portable code, you need to ensure that the paste operators are evaluated in the expected order. And that's the guarantee offered by P99_PASTE.

    Note: It's possible that there is an application in which right-to-left pasting is required, but after thinking about it for some time, the only example I could come up with of a token paste which would work right-to-left but not left-to-right is the following rather obscure corner case:

    #define DOUBLE_HASH %: ## % ## :
    

    and I can't think of any plausible context in which that might come up.