Online decoder https://2cyr.com/decode/?lang=en Some urls come to my site in Cyrillic, some links were incorrectly encoded, perhaps in the sitemap or when crawled by search bots, the only thing I know is that the Cyrillic alphabet should come and it needs to be decoded for 301 redirect. Most of them are successfully converted using iconv('UTF-8', 'Windows-1252', urldecode($text))
and iconv('UTF-8', 'ISO-8859-1', urldecode($text))
. But some (e.g. %C3%90%C5%93%C3%90%C2%B8%C3%90%C2%BA%C3%91%C6%92%C3%91%E2%82%AC%C3%91%C6%92%20%C3%90%C2%90%C3%91%C2%81%C3%90%C2%B0%C3%91%E2%80%A6%C3%90%C2%B8%C3%90%C2%BD%C3%90%C2%B0) returns false. With the help of online decoders, I see that iconv('UTF-8', 'Windows-1252', urldecode($text))
should work, but with some kind of "x-esc-entities" post filter. I just don't understand how to implement it in php?
It was tricky but I managed to make it work:
$text = '%C3%90%C5%93%C3%90%C2%B8%C3%90%C2%BA%C3%91%C6%92%C3%91%E2%82%AC%C3%91%C6%92%20%C3%90%C2%90%C3%91%C2%81%C3%90%C2%B0%C3%91%E2%80%A6%C3%90%C2%B8%C3%90%C2%BD%C3%90%C2%B0';
$out = '';
$ucs = iconv('UTF-8', 'UCS-2BE', urldecode($text));
foreach(str_split($ucs, 2) as $c)
{
if($c>="\x00\x80" && $c<="\x00\x9F")
$out .= $c[1];
else
$out .= iconv('UCS-2BE', 'windows-1252', $c);
}
echo $out;
Output:
Микуру Асахина