When using HTTP/1.1 to submit a document using the multipart/form-data
content type, each part must contain a Content-Disposition
header with the form field name given as the name
parameter.
--------------------------f0261bc90f5d4215
Content-Disposition: form-data; name="field name"
<field_data>
How should be escaped the characters in the name
parameter for each part?
As an example, curl
seems to use a backslash to escape "
and \
, but otherwise passes UTF8 data unchanged. But I can't find where this is specified, nor what is the set of characters to escape.
$ nc -l 1234 | hexdump -C&
$ curl -s -m1 http://localhost:1234 --form 'a"a=1' --form 'béb=2'
00000000 50 4f 53 54 20 2f 20 48 54 54 50 2f 31 2e 31 0d |POST / HTTP/1.1.|
00000010 0a 48 6f 73 74 3a 20 6c 6f 63 61 6c 68 6f 73 74 |.Host: localhost|
00000020 3a 31 32 33 34 0d 0a 55 73 65 72 2d 41 67 65 6e |:1234..User-Agen|
00000030 74 3a 20 63 75 72 6c 2f 37 2e 35 38 2e 30 0d 0a |t: curl/7.58.0..|
00000040 41 63 63 65 70 74 3a 20 2a 2f 2a 0d 0a 43 6f 6e |Accept: */*..Con|
00000050 74 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 32 33 34 |tent-Length: 234|
00000060 0d 0a 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 |..Content-Type: |
00000070 6d 75 6c 74 69 70 61 72 74 2f 66 6f 72 6d 2d 64 |multipart/form-d|
00000080 61 74 61 3b 20 62 6f 75 6e 64 61 72 79 3d 2d 2d |ata; boundary=--|
00000090 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d |----------------|
000000a0 2d 2d 2d 2d 2d 2d 37 32 65 65 63 63 37 65 39 61 |------72eecc7e9a|
000000b0 65 65 66 30 31 37 0d 0a 0d 0a 2d 2d 2d 2d 2d 2d |eef017....------|
000000c0 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d |----------------|
000000d0 2d 2d 2d 2d 37 32 65 65 63 63 37 65 39 61 65 65 |----72eecc7e9aee|
000000e0 66 30 31 37 0d 0a 43 6f 6e 74 65 6e 74 2d 44 69 |f017..Content-Di|
000000f0 73 70 6f 73 69 74 69 6f 6e 3a 20 66 6f 72 6d 2d |sposition: form-|
00000100 64 61 74 61 3b 20 6e 61 6d 65 3d 22 61 5c 22 61 |data; name="a\"a|
00000110 22 0d 0a 0d 0a 31 0d 0a 2d 2d 2d 2d 2d 2d 2d 2d |"....1..--------|
00000120 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d |----------------|
00000130 2d 2d 37 32 65 65 63 63 37 65 39 61 65 65 66 30 |--72eecc7e9aeef0|
00000140 31 37 0d 0a 43 6f 6e 74 65 6e 74 2d 44 69 73 70 |17..Content-Disp|
00000150 6f 73 69 74 69 6f 6e 3a 20 66 6f 72 6d 2d 64 61 |osition: form-da|
00000160 74 61 3b 20 6e 61 6d 65 3d 22 62 c3 a9 62 22 0d |ta; name="b..b".|
00000170 0a 0d 0a 32 0d 0a 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d |...2..----------|
00000180 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d |----------------|
00000190 37 32 65 65 63 63 37 65 39 61 65 65 66 30 31 37 |72eecc7e9aeef017|
000001a0 2d 2d 0d 0a |--..|
000001a4
RFC7230 specifies how backslash escaping works for quoted strings in HTTP headers.
A string of text is parsed as a single value if it is quoted using double-quote marks.
quoted-string = DQUOTE *( qdtext / quoted-pair ) DQUOTE qdtext = HTAB / SP /%x21 / %x23-5B / %x5D-7E / obs-text obs-text = %x80-FF
The backslash octet ("") can be used as a single-octet quoting
mechanism within quoted-string and comment constructs. Recipients
that process the value of a quoted-string MUST handle a quoted-pair
as if it were replaced by the octet following the backslash.quoted-pair = "\" ( HTAB / SP / VCHAR / obs-text )
This explains how backslash escaping works, and note that it includes the upper hex range as well as just the ASCII characters.