I need to make diff
ignore the case of my inputs. Both inputs contain German umlauts like ä and Ä. Option -i
successfully makes diff
ignore the case of my input for other characters like a and A, but not for umlauts:
$ diff -i <(echo ä) <(echo Ä)
1c1
< ä
---
> Ä
The output should be empty, as ä and Ä should be seen as the same letter if case is ignored. If I try this instead:
$ diff -i <(echo a) <(echo A)
Then it works as expected (no output).
I also tried to set the environment variable LANG
to make diff
use the correct locale, but this didn’t seem to have any influence:
LANG=de_DE.UTF-8 diff -i <(echo ä) <(echo Ä)
I tried various values for LANG
.
Is there a way to make diff
ignore the case of German umlauts?
(I’m on Ubuntu 22.04 FWIW.)
Compare normalized strings, see Unicode normalization forms:
diff -i <(echo ä| uconv -x Any-NFD) <(echo Ä| uconv -x Any-NFD)
Note: used uconv
from sudo apt install icu-devtools
FYI:
Form String StrLen Unicode
---- ------ ------ -------
NFC äÄ 2 \u00e4\u00c4
NFD äÄ 4 \u0061\u0308\u0041\u0308
NFKC äÄ 2 \u00e4\u00c4
NFKD äÄ 4 \u0061\u0308\u0041\u0308
from info diff
[Emphasis mine]:
18.1.1 Handling Multibyte and Varying-Width Characters
‘
diff
’, ‘diff3
’ and ‘sdiff
’ treat each line of input as a string of unibyte characters. This can mishandle multibyte characters in some cases. For example, when asked to ignore spaces, ‘diff
’ does not properly ignore a multibyte space character.Also, ‘
diff
’ currently assumes that each byte is one column wide, and this assumption is incorrect in some locales, e.g., locales that useUTF-8
encoding. This causes problems with the ‘-y
’ or ‘--side-by-side
’ option of ‘diff
’.These problems need to be fixed without unduly affecting the performance of the utilities in unibyte environments.
The IBM GNU/Linux Technology Center Internationalization Team has proposed patches to support internationalized ‘
diff
’ (http://oss.software.ibm.com/developer/opensource/linux/patches/i18n/diffutils-2.7.2-i18n-0.1.patch.gz). Unfortunately, these patches are incomplete and are to an older version of ‘diff
’, so more work needs to be done in this area.
Ubuntu 24.04 LTS (GNU/Linux 5.15.153.1-microsoft-standard-WSL2 x86_64)