I can't seem to fully understand the application of the "or" in BNF Grammar which is denoted by the vertical bar symbol (|). A good example of what gets me confused is the description of string literals in The Python Language Reference. (I've deleted part of the description which is irrelevant to the question):
stringliteral ::= [stringprefix](shortstring | longstring)
shortstring ::= "'" shortstringitem* "'" | '"' shortstringitem* '"'
shortstringitem ::= shortstringchar | stringescapeseq
shortstringchar ::= <any source character except "\" or newline or the quote>
stringescapeseq ::= "\" <any source character>
So, the way I understand the description of <shortstringitem>
is that it can be <shortstringchar>
OR <stringecapeseq>
. Does this mean it cannot be both at the same time? If I am not mistaken a single string may contain both at the same time... (For clarity <shortstingchar>
as I understand it is the text of my string)
Thank you.
Searched the web, including stackoverflow and watched explanatory videos but all seem to describe the "or" with something like:
<letter> ::= A|B|C|D|E...Y|Z.
Without going in too deep with the examples... Unfortunately this does not answer my question.
One shortstringitem
can only be one or the other. But a shortstring
can consist of multiple shortstringitem
s, each of which is "expanded" independently.
Consider 'x\n'
, for example, which you could parse as
'x\n' -> stringliteral
-> shortstring
-> "'" shortstringitem shortstringitem "'"
-> "'" shortstringchar stringescapeseq "'"
-> "'" 'x' '\' 'n' "'"
The first shortstringitem
is recognized as a shortstringchar
, the second as a stringescapeseq
.