I am using git to help me manage a text, and I need git to show me changes in each word as a whole, and in the punctuation and in the spacing as well.
Let's say I have a sentence I hold the big target, and I am hit
that is changed to I fold the target; and I am fit
. I expect git diff to show me something like this: I [-hold-]{+fold+} the [-big -]target[-,-]{+;+} and I am [-hit-]{+fit+}
, so that both the old and the new text could be correctly deduced.
I found this helpful git command:
git diff --color-words='[^[:space:]]|([[:alnum:]]|UTF_8_GUARD)+'
in How can I visualize per-character differences in a unified diff file?. As I understand, it also shows changes in any non-alphanumeric character such as (in my case) changes in punctuation, thanks to the [^[:space:]]
part.
However this command doesn't correctly display spaces when a word is deleted, like the word big
in the given example. The result of this command (where I changed --color-words
to --word-diff-regex
for display purposes) is the following:
I [-hold-]{+fold+} the [-big-]target[-,-]{+;+} and I am [-hit-]{+fit+}
as if in the original sentence there would be no space between big
and target
, so that it isn't possible anymore to correctly deduce the old text.
Someone else reported a similar issue with the whitespaces when using this command here: How to have git-diff ignore all whitespace-change but leading one?. The answers to this post imply that there would be no easy solution and the accepted solution uses whitespaces substitution with awk and sed to achieve the wanted result.
Why not just replace [^[:space:]]
with .
to have git diff --color-words='.|([[:alnum:]]|UTF_8_GUARD)+'
instead? Or is there some side effects that I am not aware of?
This command however doesn't correctly show whitespace changes
It's not trying any too hard to preserve whitespace in deleted text because when you told it what changes you're looking for you didn't mention whitespace.
Word diffing works by running an ordinary line diff, then rerunning it on changed lines after putting line breaks between identified words, then reconstructing the original line boundaries for presentation with color and/or other embedded change-boundary markers.
It can be made to care about every character, by specifying a word regex that makes every character a component of some "word", that needs to be diffed.
git diff --color-words='.|[[:alnum:]_]+'
. Done. Now every character that isn't in an alnum-or-underscore sequence gets treated as its own word and diffed like any other text.