The specification from w3c states the following for forms of enctype=application/x-www-form-urlencoded
:
This is the default content type. Forms submitted with this content type must be encoded as follows:
1) Control names and values are escaped. Space characters are replaced by
+', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by
%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').2) The control names/values are listed in the order they appear in the document. The name is separated from the value by
=' and name/value pairs are separated from each other by
&'.
There are a few kinds of line terminators in Unicode. Namely:
LF: Line Feed, U+000A
VT: Vertical Tab, U+000B
FF: Form Feed, U+000C
CR: Carriage Return, U+000D
CR+LF: CR (U+000D) followed by LF (U+000A)
NEL: Next Line, U+0085
LS: Line Separator, U+2028
PS: Paragraph Separator, U+2029
Are all of these converted to CR LF (\r\n
)?
Are all of these converted to CR LF (\r\n)?
Nope. The HTML4 spec here is unclear on what a line break is, but what browsers do, and what HTML5 has gone on to standardise is that only CR and LF are involved:
replace every occurrence of a "CR" (U+000D) character not followed by a "LF" (U+000A) character, and every occurrence of a "LF" (U+000A) character not preceded by a "CR" (U+000D) character, by a two-character string consisting of a U+000D CARRIAGE RETURN "CRLF" (U+000A) character pair
(IE doesn't quite conform to this exactly, as it treats LFCR as a single newline. But it's close enough.)