gitwhitespacegit-diff

Git diff command that shows differences in words as a whole and in any single character that isn't alphanumeric


I am using git to help me manage a text, and I need git to show me changes in each word as a whole, and in the punctuation and in the spacing as well.

Let's say I have a sentence I hold the big target, and I am hit that is changed to I fold the target; and I am fit. I expect git diff to show me something like this: I [-hold-]{+fold+} the [-big -]target[-,-]{+;+} and I am [-hit-]{+fit+}, so that both the old and the new text could be correctly deduced.

I found this helpful git command:

git diff --color-words='[^[:space:]]|([[:alnum:]]|UTF_8_GUARD)+'

in How can I visualize per-character differences in a unified diff file?. As I understand, it also shows changes in any non-alphanumeric character such as (in my case) changes in punctuation, thanks to the [^[:space:]] part.

However this command doesn't correctly display spaces when a word is deleted, like the word big in the given example. The result of this command (where I changed --color-words to --word-diff-regex for display purposes) is the following:

I [-hold-]{+fold+} the [-big-]target[-,-]{+;+} and I am [-hit-]{+fit+}

as if in the original sentence there would be no space between big and target, so that it isn't possible anymore to correctly deduce the old text.

Someone else reported a similar issue with the whitespaces when using this command here: How to have git-diff ignore all whitespace-change but leading one?. The answers to this post imply that there would be no easy solution and the accepted solution uses whitespaces substitution with awk and sed to achieve the wanted result.

Why not just replace [^[:space:]] with . to have git diff --color-words='.|([[:alnum:]]|UTF_8_GUARD)+' instead? Or is there some side effects that I am not aware of?


Solution

  • This command however doesn't correctly show whitespace changes

    It's not trying any too hard to preserve whitespace in deleted text because when you told it what changes you're looking for you didn't mention whitespace.

    Word diffing works by running an ordinary line diff, then rerunning it on changed lines after putting line breaks between identified words, then reconstructing the original line boundaries for presentation with color and/or other embedded change-boundary markers.

    It can be made to care about every character, by specifying a word regex that makes every character a component of some "word", that needs to be diffed.

    git diff --color-words='.|[[:alnum:]_]+'. Done. Now every character that isn't in an alnum-or-underscore sequence gets treated as its own word and diffed like any other text.