hexquoted-printablegb2312

What is this text: =B0=A1=C1=CB ... and how to convert it to normal text?


I have found some text in this form:

=B0=A1=C1=CB,=C4=E3=D2=B2=C3=BB=C1=AA=CF=B5=CE=D2,=D7=EE=BD=FC=CA=C7=B2=BB=CA=C7=
=BA=DC=C3=A6=B0=A1

containing mostly sequences consisting of an equal sign followed by two hexadecimal digits.

I am told it could be converted into this Chinese sentence:

啊了你也没联系我最近是不是很忙啊

What is the =B0=A1=C1 and how to decode/convert it?


Solution

  • The Chinese sentence has been encoded into an 8-bit Guobiao encoding (GB2312, GBK or GB18030; most likely the latter, though it apparently decodes correctly as the former too), and then further encoded into the 7-bit MIME quoted-printable encoding.

    To decode it into a Unicode string, first undo the quoted-printable encoding, then decode the Guobiao encoding. Here’s an example using Python:

    import quopri
    
    print(quopri.decodestring("""\
    =B0=A1=C1=CB,=C4=E3=D2=B2=C3=BB=C1=AA=CF=B5=CE=D2,=D7=EE=BD=FC=CA=C7=B2=BB=CA=C7=
    =BA=DC=C3=A6=B0=A1\
    """).decode('gb18030'))
    

    This outputs 啊了,你也没联系我,最近是不是很忙啊 on my terminal.

    The quoted-printable encoding is usually found in e-mail messages; whether it is actually in use should be determined from message headers. A message encoded in this manner should carry the header Content-Transfer-Encoding: quoted-printable. The text encoding (gb18030 in this case) should be specified in the charset parameter of the Content-Type header, but sometimes can be determined by other means.