httprfcrfc2616

what is the exact syntax and semantics of a quoted string in the http1.1 rfc2616


In the rfc2616 which is the HTTP/1.1 standard, a quoted string is defined as follows.

quoted-string  = ( <"> *(qdtext | quoted-pair ) <"> )
quoted-pair    = "\" CHAR
CHAR           = <any US-ASCII character (octets 0 - 127)>
qdtext         = <any TEXT except <">>
TEXT           = <any OCTET except CTLs, but including LWS>

With this definition "" seems to be a TEXT, and therefore <">\<"> (quote, backslash, quote) seems to be a valid quoted string. But this contradicts the proper usage of backslash as escape character and can even lead to not unambiguously being able to determine the end of the quoted string. Where is my error here?

The RFC also states

LWS            = [CRLF] 1*( SP | HT )
All linear
white space, including folding, has the same semantics as SP. A
recipient MAY replace any linear white space with a single SP before
interpreting the field value or forwarding the message downstream.

I have read the interpretation that even LWS inside quoted strings can be replaced by SP. If I take the RFC literally that's what it says. I am puzzled by this, since this means the quoted strings " ", "\n ", "\n\t \t \t", … are all the same. Can those quoted strings really not be semantically distinguished?


Solution

  • Re question 1: It's a bug in the RFC.

    See HTTPbis WG ticket 31 and HTTPbis, Part 1, Section 3.2.3.

    Re question 2: see HTTPbis Part 1, 3.2.1 - so no, you can't distinguish these.