rubyencodingstringio

Why ruby StringIO does not give different encodings


Why in the following code I get different encodings ?

require 'stringio'
a = StringIO.new('toto')
a.read(2).encoding
# => #<Encoding:ASCII-8BIT>

a.read.encoding
# => #<Encoding:UTF-8>

a.read.encoding
# => #<Encoding:ASCII-8BIT>

Solution

  • Lets dissect your code...

    a.read(2)
    

    This reads two bytes from the stream and returns a String. As you are reading a specific number of bytes, Ruby can't guarantee any character boundaries. Because of this, it specified that the returned string will by binary encoded, i.e. Encoding:ASCII-8BIT.

    In your next line, you are using

    a.read
    

    You are thus reading until the end of the stream and return all remaining data. The encoding of the returned string can either be given as an argument to the read method or default to your defined external encoding (in your case UTF-8).

    Now, as you have read to the end of the stream, any subsequent reads will either result in an error or simply return an empty string. In the case of StringIO, this happens to be binary string. Although I didn't find any documentation about this specific case, it's clearly defined in MRI's code of the StringIO class.

    a.read
    

    will thus return an empty string in binary encoding.