I'm trying to decode what I think is some quoted-printable encoded text that appears in an MBox email archive. I will give one example of some text I am having trouble with.
In the MBox, the following text appears:
"Demarcation by Theresa Castel=E3o-Lawless"
Properly decoded, I think this should appear as:
"Demarcation by Theresa Castelão-Lawless"
I'm basing my statement of what it should properly look like both off of
1) a web archive of the email in which the text is properly rendered as "Demarcation by Theresa Castelão-Lawless"
and 2) this page, which shows "=E3" as corresponding to a "ã" for quoted-printable https://www.ic.unicamp.br/~stolfi/EXPORT/www/ISO-8859-1-Encoding.html
I've tried the code below but it gives the wrong output.
string = "Demarcation by Theresa Castel=E3o-Lawless"
decoded_string = Mail::Encodings::QuotedPrintable.decode(string)
puts decoded_string + "\n"
The result from the code above is "Demarcation by Theresa Castel?o-Lawless" but as stated above, I want "Demarcation by Theresa Castelão-Lawless"
Try to avoid weird Rails stuff when you have plain old good ruby to accomplish a task. String#unpack
is your friend.
"Demarcation by Theresa Castel=E3o-Lawless".
unpack("M").first. # unpack as quoted printable
force_encoding(Encoding::ISO_8859_1).
encode(Encoding::UTF_8)
#⇒ "Demarcation by Theresa Castelão-Lawless"
or, as suggested in comments by @Stefan, one can pass the source encoding as the 2nd argument:
"Demarcation by Theresa Castel=E3o-Lawless".
unpack("M").first. # unpack as quoted printable
encode('utf-8', 'iso-8859-1')
Note: force_encoding
is needed to tell the engine this is single-byte ISO with european accents before encoding into target UTF-8
.