phpunicodenormalize

Normalize a Unicode string in PHP


In PHP,

mb_strtolower('İspanyolca');

returns

U+0069    i  LATIN SMALL LETTER I
U+0307    ̇   COMBINING DOT ABOVE
U+0073    s  LATIN SMALL LETTER S
U+0070    p  LATIN SMALL LETTER P
etc.

I need to get rid of the "U+0307 ̇ COMBINING DOT ABOVE";

I tried this:

$TheUrl = mb_strtolower('İspanyolca');
$TheUrl = normalizer_normalize($TheUrl, Normalizer::FORM_C);

The combining dot above persists.


Solution

  • To handle this case, you can use the strtr function to replace specific characters in the string like my example below

    $TheUrl = 'İspanyolca';
    $TheUrl = mb_strtolower($TheUrl, 'UTF-8');
    $TheUrl = strtr($TheUrl, array('i̇' => 'i', 'İ' => 'i'));
    

    This will replace the lowercase 'i' with a dot above and the uppercase 'İ' with a regular lowercase 'i'.