rubyquoted-printable

How to properly decode string with quoted-printable encoding in Ruby


I'm trying to decode what I think is some quoted-printable encoded text that appears in an MBox email archive. I will give one example of some text I am having trouble with.

In the MBox, the following text appears:

"Demarcation by Theresa Castel=E3o-Lawless"

Properly decoded, I think this should appear as:

"Demarcation by Theresa Castelão-Lawless"

I'm basing my statement of what it should properly look like both off of

1) a web archive of the email in which the text is properly rendered as "Demarcation by Theresa Castelão-Lawless"

and 2) this page, which shows "=E3" as corresponding to a "ã" for quoted-printable https://www.ic.unicamp.br/~stolfi/EXPORT/www/ISO-8859-1-Encoding.html

I've tried the code below but it gives the wrong output.


string = "Demarcation by Theresa Castel=E3o-Lawless"

decoded_string = Mail::Encodings::QuotedPrintable.decode(string)

puts decoded_string + "\n"

The result from the code above is "Demarcation by Theresa Castel?o-Lawless" but as stated above, I want "Demarcation by Theresa Castelão-Lawless"


Solution

  • Try to avoid weird Rails stuff when you have plain old good ruby to accomplish a task. String#unpack is your friend.

    "Demarcation by Theresa Castel=E3o-Lawless".
      unpack("M").first. # unpack as quoted printable
      force_encoding(Encoding::ISO_8859_1).
      encode(Encoding::UTF_8)
    #⇒ "Demarcation by Theresa Castelão-Lawless"
    

    or, as suggested in comments by @Stefan, one can pass the source encoding as the 2nd argument:

    "Demarcation by Theresa Castel=E3o-Lawless".
      unpack("M").first. # unpack as quoted printable
      encode('utf-8', 'iso-8859-1')
    

    Note: force_encoding is needed to tell the engine this is single-byte ISO with european accents before encoding into target UTF-8.