I have a lot of legacy Windows Resource Files (*.rc) in my repository. Initially they had UTF-8, ASCII, UTF-16 or other encoding. Also there were no .gitattributes in my repo, so git wasn't able to produce diff for UTF-16 files and treated them as binary.
In my dev
branch I've converted all of *.rc files to UTF-16LE-BOM with CRLF line endings and added .gitattributes with **/*.rc text working-tree-encoding=UTF-16LE-BOM eol=CRLF
. (I've converted all files to UTF-16 to simply use *.rc mask and not list all the files explicitly.)
But when I clone my repo and try git checkout
to dev
branch I see a lot of errors failed to encode '...' from UTF-8 to UTF-16LE-BOM
, so it looks like git treats them as UTF-8 regardless of .gitattributes contents. Also after checkout all of the *.rc files become corrupted.
Before:
#include <winver.h>
// Version Information
#ifndef _DEBUG
After:
#include <winver.h>
ഀഀ
// Version Information
⌀椀昀渀搀攀昀 开䐀䔀䈀唀䜀ഀഀ
Also If I type git status
after checkout, git shows failed to encode '...' from UTF-16LE-BOM to UTF-8
.
Is there any way to fix this?
The documentation for the .gitattributes file has examples for configuring the working tree encoding of UTF-16 files which (1) have a byte order mark (BOM), or (2) lack one. To paraphrase:
If the UTF-16 file has a BOM, set working-tree-encoding=UTF-16
. Git apparently inspects the BOM to distinguish little endian from big endian.
If the UTF-16 file lacks a BOM, set working-tree-encoding=UTF-16LE
or working-tree-encoding=UTF-16BE
depending on the text encoding in use.
The encoding type UTF-16LE-BOM
is never mentioned in the documentation; however, Git supports encodings available when you run iconv --list
from a terminal (or from Git Bash on Windows). UTF-16LE-BOM
does not appear as a supported encoding on either Ubuntu 18.04 or Windows 10.