encodingbencoding

How to bencode non-ascii strings and non-integer numbers?


According to the bencoding specification:

Bencoded strings are encoded as follows: <string length encoded in base ten ASCII>:<string data>, or key:value Note that there is no constant beginning delimiter, and no ending delimiter.

Example: 4:spam represents the string "spam"
Example: 0: represents the empty string ""

Integers are encoded as follows: i<integer encoded in base ten ASCII>e The initial i and trailing e are beginning and ending delimiters. You can have negative numbers such as i-3e. Only the significant digits should be used, one cannot pad the Integer with zeroes. such as i04e. However, i0e is valid.

Example: i3e represents the integer "3"


My questions:

Question 1: How should I bencode a string with non-ascii characters? For example: mûrier or die höhe Zeit Shall I convert a such string to the sequence of bytes, using UTF-8 encoding, or another one? And how does it apply to the specification?

Question 2: How to bencode a non-integer number, for example 1.0002910 or -0.0049172?


Solution

    1. From the spec, "All character string values are UTF-8 encoded."
    2. Not covered by the spec; Apparently not needed.