pythonencodingcharactergbk

Encode a string to gbk in Python


I am trying to see what different strings would look like in different encodings...

For example:

>>> str1 = "asdf"
>>> str1.encode('utf-16')
'\xff\xfea\x00s\x00d\x00f\x00'
>>> str1.encode('base64')
'YXNkZg==\n'

And those all get me what I want.

But I'd like to see what certain strings would look like in gbk, gb2312, or gb18030.

>>> str1.encode('gbk')
'asdf'
>>> str1.encode('gb2312')
'asdf'
>>> str1.encode('gb18030')
'asdf'

Shouldn't the outputs be something other than 'asdf'?

I have python 2.7 and I can see the gbk.py and the other files in lib/encodings

I was wondering if I see no change in the output because those letters will show up the same in that encoding, or because I need to somehow enable the use of those encodings (some sort of import needed?)...


Solution

  • As long as only byte values 0-127 are used, these encodings are equivalent to ASCII. The same is true for UTF-8. To really see the difference, try with some actual Chinese.