gitutf-8utf-16

Git failed to encode from UTF-8 to UTF-16LE-BOM


I have a lot of legacy Windows Resource Files (*.rc) in my repository. Initially they had UTF-8, ASCII, UTF-16 or other encoding. Also there were no .gitattributes in my repo, so git wasn't able to produce diff for UTF-16 files and treated them as binary.

In my dev branch I've converted all of *.rc files to UTF-16LE-BOM with CRLF line endings and added .gitattributes with **/*.rc text working-tree-encoding=UTF-16LE-BOM eol=CRLF. (I've converted all files to UTF-16 to simply use *.rc mask and not list all the files explicitly.)

But when I clone my repo and try git checkout to dev branch I see a lot of errors failed to encode '...' from UTF-8 to UTF-16LE-BOM, so it looks like git treats them as UTF-8 regardless of .gitattributes contents. Also after checkout all of the *.rc files become corrupted.

Before:

#include <winver.h>

// Version Information
#ifndef _DEBUG

After:

#include <winver.h>
਍ഀഀ
// Version Information
਍⌀椀昀渀搀攀昀 开䐀䔀䈀唀䜀ഀഀ

Also If I type git status after checkout, git shows failed to encode '...' from UTF-16LE-BOM to UTF-8.

Is there any way to fix this?


Solution

  • The documentation for the .gitattributes file has examples for configuring the working tree encoding of UTF-16 files which (1) have a byte order mark (BOM), or (2) lack one. To paraphrase:

    If the UTF-16 file has a BOM, set working-tree-encoding=UTF-16. Git apparently inspects the BOM to distinguish little endian from big endian.

    If the UTF-16 file lacks a BOM, set working-tree-encoding=UTF-16LE or working-tree-encoding=UTF-16BE depending on the text encoding in use.

    The encoding type UTF-16LE-BOM is never mentioned in the documentation; however, Git supports encodings available when you run iconv --list from a terminal (or from Git Bash on Windows). UTF-16LE-BOM does not appear as a supported encoding on either Ubuntu 18.04 or Windows 10.