phpiconvurldecode

Cyrillic in php utf-8 to windows-1252


Online decoder https://2cyr.com/decode/?lang=en Some urls come to my site in Cyrillic, some links were incorrectly encoded, perhaps in the sitemap or when crawled by search bots, the only thing I know is that the Cyrillic alphabet should come and it needs to be decoded for 301 redirect. Most of them are successfully converted using iconv('UTF-8', 'Windows-1252', urldecode($text)) and iconv('UTF-8', 'ISO-8859-1', urldecode($text)). But some (e.g. %C3%90%C5%93%C3%90%C2%B8%C3%90%C2%BA%C3%91%C6%92%C3%91%E2%82%AC%C3%91%C6%92%20%C3%90%C2%90%C3%91%C2%81%C3%90%C2%B0%C3%91%E2%80%A6%C3%90%C2%B8%C3%90%C2%BD%C3%90%C2%B0) returns false. With the help of online decoders, I see that iconv('UTF-8', 'Windows-1252', urldecode($text)) should work, but with some kind of "x-esc-entities" post filter. I just don't understand how to implement it in php?


Solution

  • It was tricky but I managed to make it work:

    $text = '%C3%90%C5%93%C3%90%C2%B8%C3%90%C2%BA%C3%91%C6%92%C3%91%E2%82%AC%C3%91%C6%92%20%C3%90%C2%90%C3%91%C2%81%C3%90%C2%B0%C3%91%E2%80%A6%C3%90%C2%B8%C3%90%C2%BD%C3%90%C2%B0';
    $out = '';
    $ucs = iconv('UTF-8', 'UCS-2BE', urldecode($text));
    foreach(str_split($ucs, 2) as $c)
    {
        if($c>="\x00\x80" && $c<="\x00\x9F")
            $out .= $c[1];
        else
            $out .= iconv('UCS-2BE', 'windows-1252', $c);
    }
    echo $out;
    

    Output:

    Микуру Асахина