I'm reading ISO C draft standard n3096 and notice the following bold statement (§ 5.1.1.2 p1):
- The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.
This statement also appears in C90, and presumably every standard in between.
My question is: why does the Standard explicitly prohibit a source file from ending in a partial preprocessing token? Is this statement not made redundant by other declarations of undefined behavior?
First, I will provide my (mis)understanding.
As a working definition, because the C Standard does not mention "partial preprocessing token" in any other place, I defer to this informative footnote in the C++ Standard (n4928, § 5.2 [lex.phases]):
10) A partial preprocessing token would arise from a source file ending in the first portion of a multi-character token that requires a terminating sequence of characters, such as a header-name that is missing the closing
"
or>
. ...
...where header-name is a kind of preprocessing-token, defined (§ 6.4 p1)
preprocessing-token:
- header-name
- identifier
- pp-number
- character-constant
- string-literal
- punctuator
- each universal-character-name that cannot be one of the above
- each non-white-space character that cannot be one of the above
From this, I believe the preprocessing tokens that can be "partial" include at least header-name, character-constant, string-literal, and hexadecimal-floating-constant (a kind of pp-number that is terminated by a binary-exponent-part).
Regarding the statement prohibiting partial preprocessing tokens at the end of a source file: I assume it is not redundant, and it prohibits some case not already prohibited elsewhere. Assuming a source file ending in a partial preprocessing token is non-empty, and given that a non-empty source file is already required to end in a new-line character not part of a line splice (§ 5.1.1.2 p1), I believe the aforementioned statement describes the case where the source file ends in a new-line not part of a line splice, and also ends in a partial preprocessing token; so the partial preprocessing token contains a new-line after translation phase 2.
But, preprocessing tokens do not contain new-lines after TP 2. (/*
comments can though, which is why the prohibition of partial comments makes sense to me.)
(If it is relevant, I am also confused about the concept of a partial preprocessing token in the first place... It is apparently not a preprocessing token, but I thought there are no invalid preprocessing tokens because of the fallback in the preprocessing-token rule, "each non-white-space character that cannot be one of the above.")
Sorry for the long question...
Many thanks,
Your question seems to be almost exactly the same to the defect report 324 https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_324.htm , which asks:
Assuming there is a non-empty source file legally ending with a new-line character, what are examples of such partial preprocessing tokens that could end the file? And, generally, what the partial preprocessing tokens are?
As a working definition, because the C Standard does not mention "partial preprocessing token" in
From DR324:
"Partial preprocessing token" is not itself a technical term; it is merely the English Language word "partial" modifying the technical term "preprocessing token". A preprocessing token is defined by the grammar non-terminal preprocessing-token in Subclause 6.4. A partial preprocessing token is therefore just part of a preprocessing token that is not the entire preprocessing token.
why does the Standard explicitly prohibit a source file from ending in a partial preprocessing token? Is this statement not made redundant by other declarations of undefined behavior?
From DR324:
The statement that "source files shall not end in a partial preprocessing token or in a partial comment" has two implications. First, a preprocessing token may not begin in one file and end in another file. Second, the last preprocessing token in a source file must be well-formed and complete. For example, the last token may not be a string literal missing the close quote.
Overall, I think the issue is about:
// string.h
"am I a string
// main.c
#include <string.h>\
literal?"