I was just parsing the following website.
There one finds the text
und wären damit auch
At first, the "ä" looks perfectly fine, but once I inspect it, it turns out that this is not the regular "ä" (represented as ascw 228) but this:
ascw: 97, char: a
ascw: 776, char: ¨
I have never before seen an "ä" represented like this.
How can it happen that a website uses this weird character combination and what might be the benefit from it?
What you don't mention in your questions is the used encoding. Quite obviously it is a Unicode based encoding.
In Unicode, code point U+0308 (776 in decimal) is the combining diaeresis. Out of the letter a
and the diaeresis, the German character ä
is created.
There are indeed two ways to represent German characters with umlauts (ä in this case). Either as a single code point:
U+00E4 latin small letter A with diaeresis
Or as a sequence of two code points:
U+0061 latin small letter A
U+0308 combining diaeresis
Similarly you would combine two code points for an upper case 'Ä':
U+0041 latin capital letter A
U+0308 combining diaeresis
In most cases, Unicode works with two codes points as it requires fewer code points to enable a wide range characters with diacritics. However for historical reasons a special code point exist for letters with German umlauts and French accents.
The Unicode libraries is most programming languages provide functions to normalize a string, i.e. to either convert all sequences into a single code point if possible or extend all single code points into the two code point sequence. Also see Unicode Normalization Forms.