encodingutf-8cjkchinese-locale

Chinese in Japanese encoding


This may sound like a stupid question. I typed some Chinese characters into an empty text file in VS code text editor (default utf8). Then I saved the file in an encoding for Japanese: shift JIS, which apparently doesn't cover all the characters I have typed in.

However, before I close the file, all Chinese characters are displayed properly in VS code. Now after I closed the file and reopened it using shift JIS encoding, several characters are displayed as a question mark ?. I guess these are the Chinese characters not covered by the Japanese encoding?

What happened in the process? Is there anyway I can 'get back' the Chinese characters that are now shown in ?? I don't really understand how encoding works in this scenario...


Solution

  • Not all encodings cover all characters. (Unicode encodings, in principle, do, but even they don't have quite everything yet.) If you save some text in an encoding which does not include all characters in that text, something has to give.

    Options:

    Once that conversion is done, the data is lost, and cannot be recovered. Why not use UTF-8 or another Unicode encoding? (GB 18030 might be the best for large amounts of Chinese text.)