gitgithubnewlinecore.autocrlf

How to git commit some files with CRLF and others with LF on Mac?


I have a repo that contains 7 files with CRLF and everything else with LF. I'm working on a mac which, by default, uses LF.

When I commit my changes on git and push it to Github, and my colleague pulls the branch, those 7 files show up as LF.

What I want to do is to be able to commit everything as is on my mac (i.e. 7 files with CRLF and everything else with LF) and also make sure when someone pulls my branch on a mac or linux they get exactly same thing as I committed? Bear in mind I've already committed these 7 files with LF (accidentally) as I didn't know how git is changing these things. So, I need to make sure changes get to the remote branch as well.

I read a handful of QAs, but mostly got confused between core.autocrlf and .gitattributes. Any help and/or explanation of how to achieve this is much appreciated.


Solution

  • You have to be kind of careful here, because Git has some built in conversions, and they strongly favor LF-only line endings. If you use the built-in conversions, that's what you'll actually get in your repository. But we must distinguish between what's in the repository and what's in your work-tree.

    You do not work with files in the repository, because they're unsuitable for doing work. You work with files in your work-tree. And, in one sense, Git doesn't even store files, because the basic unit of Git storage is the commit. But commits themselves do store (snapshots of) files, so in that sense, Git stores files. It's just that they're all-the-commit, or none-of-it, at a time.

    The files inside each Git commit are stored in a special, read-only, Git-only, compressed form. The other programs on your computer, including your editors and file-viewers, cannot work with these files.1 They're great for archival and completely useless, at least on their own, for getting any new work done.

    So, when you use git checkout to pick a particular commit, Git extracts these files. They come out of the commit, going from the special, read-only, Git-only, freeze-dried format, to plain ordinary files. These uncompressed and rehydrated files can be used by every program on your computer. Those are the files you will see and work with, and Git copies them into your work-tree: the area in which you do your work.

    The work-tree is not really part of the repository. And, when Git does file-format changes according to .gitattributes or core.autocrlf or any of these other options you can choose, Git only does them when copying files from the index to the work-tree, and when copying files from the work-tree to the index. We haven't touched on the index yet, but it's time to do that.


    1Some editors probably can, at this point: I'd be surprised if there is no GNU-emacs mode for viewing Git internal objects, for instance. :-) But most in general can't, and don't need to anyway.


    The index sits between the commits and the work-tree

    The index is perhaps best described, in one phrase, as the place where you build your next commit. This thing—which Git calls, variously, the index, the staging area, or (rarely these days) the cache—actually has multiple functions. In particular it's quite crucial when dealing with a conflicted merge. But for our purposes here, we only care about the fact that it sits between the committed files, and the work-tree copies of those files.2

    That is: when you ask Git to check out a particular commit, what Git does is to copy3 the files that are stored in that commit into the index. Unlike the frozen files in commits, though, the copies in the index can be overwritten. Then, having copied the commit to the index, Git now copies the index's files to the work-tree.

    This last step—copy index file to work-tree—is when Git does end-of-line conversions. There is only one conversion Git can do here on its own: it can turn newline terminated lines into CRLF terminated lines.

    Now that all your files are in your work-tree, you can work on them as much as you like. You can keep the line endings the same, or change them as you like. You can replace files wholesale or edit them with editors or whatever you want. They're just files and they are totally under your control.

    Now that you've changed these files, though, you might want your next commit to have the updated files. Here, you must run git add: what git add does is to copy the work-tree file into the index. This compresses and otherwise Git-ifies the file, so that it's now in the freeze-dried format in the index, ready to be committed.4 And, once again, there is only one conversion built into Git here: it can replace CRLF line endings with newline-only line endings.

    Git cannot change to CRLF endings in the repository

    Take note of the two transformations that are built into Git. All of the control settings, whether in .gitattributes or not, are just turning these conversion settings on or off. Either Git turns newlines into CRLF on the way out—from index to work-tree—or it doesn't. Either Git turns CRLF into newlines on the way in, from work-tree to index, or it doesn't. There is no process by which Git can turn newline-endings to CRLF-endings in the index.

    You can, of course, simply not use the transformations that are built in to Git. But if you want to work with CRLF endings, yet store—in the repository—newline-only endings, you can arrange for that to occur. The real questions here are:

    If the answer to the last question is that they are acceptable and yes, then forget about what's in the repository and concentrate on getting the right setting in .gitattributes. Git will do the called-for transformations during file extraction operations and git add operations.

    If the answer to the second question is that you care about what's in the repository, and you want CRLF endings, you probably shouldn't use Git's built in conversions. You can write your own smudge and clean filters to do what you want: make your "clean" filter store lines with CRLF endings. (The fact that there are no built in methods to do this means that if you think you want that, you should be pretty sure: the Git folks do try to cover Windows and MacOS reasonably well.)


    2In a sense, the index isn't required. Other version control systems function just fine without one. But here, we must know about it, because it's how all this stuff actually works, in Git.

    3Git doesn't actually copy them. Because of their dehydrated / freeze-dried form, Git can just refer to the committed files. What's in the index is really Git internal blob object hash IDs, plus each file's name, plus a lot of make-it-go-fast cache data, all arranged in a way that is suitable for Git, and not suitable for anything else. But unless you start looking at the parts of the index, e.g., with git ls-files --stage or git update-index, none of this really matters. You can think of what's in the index as copies of files, and it all works out.

    4Technically, git add makes a new blob object, or re-uses some existing blob object if there is one with the right content. Then it puts the blob hash into the index, as noted in footnote 3. But again you can just think of this as a copy operation: it works out the same.