After studying HTTP/1.1 standard, specifically page 31 and related I came to conclusion that any 8-bit octet can be present in HTTP header value. I.e. any character with code from [0,255] range.
And yet HTTP servers I tried refuse to take anything with code > 127 (or most US-ASCII non-printable chars).
Here is dried out excerpt of grammar used in standard:
message-header = field-name ":" [ field-value ]
field-name = token
field-value = *( field-content | LWS )
field-content = <the OCTETs making up the field-value and consisting of
either *TEXT or combinations of token, separators, and
quoted-string>
CR = <US-ASCII CR, carriage return (13)>
LF = <US-ASCII LF, linefeed (10)>
SP = <US-ASCII SP, space (32)>
HT = <US-ASCII HT, horizontal-tab (9)>
CRLF = CR LF
LWS = [CRLF] 1*( SP | HT )
OCTET = <any 8-bit sequence of data>
CHAR = <any US-ASCII character (octets 0 - 127)>
CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
TEXT = <any OCTET except CTLs, but including LWS>
token = 1*<any CHAR except CTLs or separators>
separators = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | "\"
| <"> | "/" | "[" | "]" | "?" | "=" | "{" | "}" | SP | HT
quoted-string = ( <"> *(qdtext | quoted-pair ) <"> )
qdtext = <any TEXT except <">>
quoted-pair = "\" CHAR
As you can see field-content
can be a quoted-string
, which is an enquoted sequence of TEXT
(i.e. any 8-bit octet with exception of "
and values from [0-8, 11-12, 14-31, 127]
range) or quoted-pair
(\
followed by any value from [0, 127]
range). I.e. any 8-bit char sequence can be passed by en-quoting it and prefixing special symbols with \
).
(Note that standard doesn't treat NUL(0x00)
char in any special way)
But, obviously either all servers I tried are not conforming or standard has changed since 1999 or I can't read it properly.
So... which characters are allowed in HTTP header values and why?
P.S. Reason behind all of this: I am looking for a way to pass utf-8-encoded sequence in HTTP header value (without additional encoding, if possible).
RFC 2616 is obsolete, the relevant part has been replaced by RFC 7230.
The NUL octet is no longer allowed in comment and quoted-string text, and handling of backslash-escaping in them has been clarified. The quoted-pair rule no longer allows escaping control characters other than HTAB. Non-US-ASCII content in header fields and the reason phrase has been obsoleted and made opaque (the TEXT rule was removed). (Section 3.2.6)
In essence, RFC 2616 defaulted to ISO-8859-1, and this was both insufficient and not interoperable anyway. Thus, RFC 7230 has deprecated non-ASCII octets in field values. The recommendation is to use an escaping mechanism on top of that (such as defined in RFC 8187, or plain URI-percent-encoding).