gitencodinggit-diffbyte-order-mark

Is there any way for git diff to diff files encoded in UTF-8 BOM properly?


I have a git repository in which all files are encoded in UTF8-BOM, this encoding causes nothing but problems but I have no choice on this matter. When I try to run git diff to check changes between branches or commits, sometimes git thinks the entire file changed due to stripping BOM from one side, making the output of the diff is useless most of the time.

I have tried the -w option but it didn't work, also seen answers in which they filter out the BOM but i can't get them to work on Windows 11 with powershell or with git bash but i think the output would be erroneous in case the BOM actually got removed in a commit.


Solution

  • In UTF-8, the BOM is optional and not recommended. RFC 3629 calls it "useless".

    Git does not strip the BOM, nor does it consider the BOM to be whitespace. git diff has the goal that it will show you what is actually modified, even if that happens to be inconvenient, so it's doing the right thing here.

    Git has special handling for UTF-16 with BOM for working-tree encodings, which are specified in .gitattributes. However, there is no special handling for UTF-8 with BOM because this is rare and considered to be a mistake.

    However, you could run iconv -l in Git Bash and see if UTF-8-BOM is listed, in which case you could set that in .gitattributes, add the .gitattributes file, then run git add --renormalize . and commit. That will store the UTF-8 without a BOM in the repo, and write UTF-8 with BOM in the working tree, which will unbreak diffs. Note that such a configuration will not work on Linux, which does not have that encoding, so your repository will be broken there. I don't have a Mac handy to test, so I can't speak to its functionality there.

    You could ask for this encoding to be added on the Git list, but it is likely that you will be recommended to just drop the BOM instead, so if you request that, or want to send a patch, you should explain in some detail why your configuration is compelling and valuable to support.

    Besides the working-tree-encoding option, there is no way to make Git ignore your BOM in diff output.