When I cat a file in bash I get the following:
$ cat /tmp/file
microsoft
When I view the same file in vim I get the following:
^@m^@i^@c^@r^@o^@s^@o^@f^@t^@
How can I identify and remove these "non-printable" characters. What does '^@' mean in vim??
(Just a piece of background information: the file was created by base 64 decoding and cutting from the pssh header of an mpd file for Microsoft Playready)
What you see is Vim's visual representation of unprintable characters. It is explained at :help 'isprint'
:
Non-printable characters are displayed with two characters: 0 - 31 "^@" - "^_" 32 - 126 always single characters 127 "^?" 128 - 159 "~@" - "~_" 160 - 254 "| " - "|~" 255 "~?"
Therefore, ^@
stands for a null byte = 0x00. These (and other non-printable characters) can come from various sources, but in your case it's an ...
If you clearly observe your output in Vim, every second byte is a null byte; in between are the expected characters. This is a clear indication that the file uses a multibyte encoding (utf-16
, big endian, no byte order mark to be precise), and Vim did not properly detect that, and instead opened the file as latin1
or so (whereas things worked out properly in the terminal).
To fix this, you can either explicitly specify the encoding:
:edit ++enc=utf-16 /tmp/file
Or tweak the 'fileencodings'
option, so that Vim can automatically detect this. However, be aware that ambiguities (as in your case) make this prone to fail:
For an empty file or a file with only ASCII characters most encodings will work and the first entry of 'fileencodings' will be used (except "ucs-bom", which requires the BOM to be present).
That's why a byte order mark (BOM) is recommended for 16-bit encodings; but that assumes that you have control over the output encoding.