In a PHP project I use the idn_to_utf8 function to convert domaine name from punycode to unicode string.
But sometimes this function return the punycode and not the unicode string.
Example :
echo idn_to_utf8('xn--fiq57vn0d561bf5ukfonh1o');
// Return : xn--fiq57vn0d561bf5ukfonh1o
// It should return : 中島第2駐輪場
echo idn_to_utf8('xn--fiqu6mnndw87c3ucbt0a1ea684a');
// Return : 中味鋺自転車置場
There are libraries which correctly convert punycode (http://idnaconv.phlymail.de/index.php?encoded=xn--fiq57vn0d561bf5ukfonh1o&decode=%3C%3C+Decode&lang=de) but I prefer use a PHP function than a library.
Do you have any ideas of origins of this problem ?
Edit / Solution and Explanation : To summarize and explain the problem : This code show the problem :
echo idn_to_ascii('吉津第2自転車置場');
?><br /><?php
echo idn_to_utf8(idn_to_ascii('吉津第2自転車置場'));
?> Should be : 吉津第2自転車置場 <br /><?php
This code displays the following :
xn--2-958a11kws1a96p50fgxenr6afga
吉津第2自転車置場 (Should be) : 吉津第2自転車置場
To be more clear : When we get the punycode of 吉津第2自転車置場, before convert this string PHP convert it to 吉津第2自転車置場 (The character "2" is different). So, with idn_to_ascii function we can't convert all unicode characters because PHP convert certain unicode character to others (in this example PHP converts 2 to 2 (sorry for sounding of this "two to "two").
This works fine. I think characters [A-Z0-9]
cannot be used.
echo idn_to_utf8('xn--2-kq6aw43af1e4y9boczagup'); // 中島第2駐輪場
Factually, our chromes will automatically convert 中島第2駐輪場.com
into 中島第2駐輪場.com
before accessing.
UPDATED:
A normalization rule named NAMEPREP
seems to be provided: https://www.nic.ad.jp/ja/dom/idn.html
UPDATED:
That seems to be invaild...