gittextplaintext

Possible to change git line break to period (for better sentence-based diffs)?


Is it possible to change the line break used by git to something other than the default \n (e.g. a period . or period plus whitespace)?

I am asking because this would make it easier to use git to manage text files such as documentation and markdown files. I have seen articles suggesting people to put each sentence in its own line just so that it is treated as one unit by git (rather than a part of a longer paragraph), which is awkward. Hence the question here.

I did some internet search to no avail.


Solution

  • interesting idea! But sorry, no.

    I upvoted your question because I love the idea. Unfortunately the answer is: No, Git does not support this.

    As stated in the git config documentation, the valid values for core.eol are lf and crlf:

    Sets the line ending type to use in the working directory for files that are marked as text (either by having the text attribute set, or by having text=auto and Git auto-detecting the contents as text). Alternatives are lf, crlf and native, which uses the platform’s native line ending. The default value is native. See gitattributes[5] for more information on end-of-line conversion. Note that this value is ignored if core.autocrlf is set to true or input.

    Other related git config settings are core.safecrlf and core.autocrlf. gitattributes documentation also says the same.

    why git is unlikely to ever support this

    lf and cf are control characters with very specific meaning. Regular characters such as period . have many meanings depending on the context. In many langauges it marks the end of a sentence. But it means something different in numbers. ... is often used to be an ellipses, which is not three sentence endings.

    So git supporting such an option would result in a mess for many text files stored in a git repo.

    a workaround: use a git commit hook to automatically insert lf after every period in your text file that doesn't have one.

    It would be a pretty simple regular expression to do that.

    By trying this approach you will discover one of two things:

    why you really don't need this

    The reason there are "articles suggesting people to put each sentence in its own line" is because git diff used to support only line granularity diffs. Line diffs work great for code but suck for prose. Inserting a sentence or even editing one word results in the whole paragraph being marked as changed unless the paragraph is broken up into lines.

    But git diff now supports word granularity if you use the --word-diff[=<mode>], --word-diff-regex=<regex> or --color-words[=<regex>] option.

    Type git help diff or see git-diff Documentation for more info.