linuxvisual-studio-codevimfile-encodings

why set fileencoding=cp936 no use?


  1. I create a file, execute the following command:
:set fileencoding
  1. result is :
fileencoding=cp936
  1. I edit and then close the file. I reopen the file and execute the following command:
:set fileencoding
  1. result is :
fileencoding=utf-8

The contents of .vimrc are:

...
set fencs=ucs-bom,utf-8,gbk,gb18030,utf-16,big5
set fenc=cp936
set encoding=utf-8

Also, I used a remote connection in vscode. Why does the value of fileencoding change? What is the reason for this ? How to solve this problem ? thanks!


Here are the results of my attempts:

  1. When the content contains only English, content is as follows, I save and reopen it. And then i execute the command: set fileencoding, the result is fileencoding=utf-8。also, I execute file test1.c,the result is test1.c: ASCII text.
//file: test1.c
abc
  1. When the content contains Chinese: content is as follows, I save and reopen it. And then i execute the command: set fileencoding, the result is fileencoding=cp936。also, I execute file test2.c,the result is test2.c: ISO-8859 text.
//file:test2.c
你好abc
  1. .vimrc content is :
...
set fencs=ucs-bom,utf-8,gbk,gb18030,utf-16,big5
set fenc=cp936
set encoding=utf-8

My question is why fileencoding is utf-8 and not cp936 when the content is in English only?


Solution

  • It's actually not a Vim issue but an encoding issue. Vim does what you ask for and it does not make a difference.

    There are two pieces of information that explain the behavior. The first one is that text files contain no meta information about their encoding. It's actually just a bunch of bytes. How they are interpreted is up to the application. Applications will have to guess. Judging from the bulk of related questions on a popular programming Q&A site, this is hard.

    The second piece of the puzzle is that the first 128 characters of both UTF-8 and CP936 are identical to the ASCII character set. Take a look at the code page file for CP936 and compare it with ASCII.

    This is by design. So for bytes 0x00 to 0x7f, it's just plain ASCII, no matter what encoding you specify.

    Using Vim, let's create a simple text file containing "hello world" and take a look at it:

    > xxd ascii.txt 
    00000000: 6865 6c6c 6f20 776f 726c 640a            hello world.
    > file ascii.txt 
    ascii.txt: ASCII text
    

    After set fileencoding=cp936 and saving again, you will get identical results.

    Note the complete absence of any encoding meta information. The whole file is just "hello world" and a newline.

    Everything changes once you introduce non-ASCII characters. The first non-ASCII in CP936 is the Euro sign encoded as 0x80. So let's say "h€llo world" and re-run the file investigations:

    > xxd utf-8.txt 
    00000000: 68e2 82ac 6c6c 6f20 776f 726c 640a       h...llo world.
    > file utf-8.txt
    utf-8.txt: UTF-8 Unicode text
    > xxd cp936.txt
    00000000: 6880 6c6c 6f20 776f 726c 640a            h.llo world.
    > file cp936.txt
    cp936.txt: Non-ISO extended-ASCII text
    

    Note that € is encoded as e2 82 ac as one would expect in UTF-8 while it's encoded as 80 in CP936.

    Also note that file cannot correctly guess CP936 encoding.

    As we have seen, there's no difference in file contents as long as only ASCII characters are used. So the bottom line is that Vim saves your ASCII files as CP936 but it doesn't make a difference.

    To help Vim get the encoding right when opening files, you would add cp936 near the start of 'fileencodings'.