textencodingdiacritics

What is this crazy German character combination to represent an umlaut?


I was just parsing the following website.

There one finds the text

und wären damit auch

At first, the "ä" looks perfectly fine, but once I inspect it, it turns out that this is not the regular "ä" (represented as ascw 228) but this:

ascw: 97, char: a
ascw: 776, char: ¨

I have never before seen an "ä" represented like this.

How can it happen that a website uses this weird character combination and what might be the benefit from it?


Solution

  • What you don't mention in your questions is the used encoding. Quite obviously it is a Unicode based encoding.

    In Unicode, code point U+0308 (776 in decimal) is the combining diaeresis. Out of the letter a and the diaeresis, the German character ä is created.

    There are indeed two ways to represent German characters with umlauts (ä in this case). Either as a single code point:

    U+00E4 latin small letter A with diaeresis
    

    Or as a sequence of two code points:

    U+0061 latin small letter A
    U+0308 combining diaeresis
    

    Similarly you would combine two code points for an upper case 'Ä':

    U+0041 latin capital letter A
    U+0308 combining diaeresis
    

    In most cases, Unicode works with two codes points as it requires fewer code points to enable a wide range characters with diacritics. However for historical reasons a special code point exist for letters with German umlauts and French accents.

    The Unicode libraries is most programming languages provide functions to normalize a string, i.e. to either convert all sequences into a single code point if possible or extend all single code points into the two code point sequence. Also see Unicode Normalization Forms.