I have a mysql varchar(50) row in cp1251_general_ci collation. After mysql_fetch_row in php i got a $string. Then i do the following:
echo mb_detect_encoding($string,'CP1251,UTF-8,Windows-1251'); // echoes Windows-1251
$string = mb_convert_encoding($string, 'UTF-8', 'Windows-1251');
echo mb_detect_encoding($string,'CP1251,UTF-8,Windows-1251'); // again echoes Windows-1251
Why the second time the string is not UTF-8?
I also tried
$string = iconv('Windows-1251', 'UTF-8', $string);
But again the out charset is Windows-1251.
And in the final result i got broken encoding in my filename which consists of $string variable.
How can i convert from mysql cp1251_general_ci collation (Windows-1251) into UTF-8?
P.S.
echo $string; \\ echoes ������
echo bin2hex($string); \\ echoes cce5e3e0f4eeed
$string = mb_convert_encoding($string, 'UTF-8', 'Windows-1251');
echo $string; \\ echoes Мегафон
echo bin2hex($string); \\ echoes d09cd0b5d0b3d0b0d184d0bed0bd
But
fopen("../tmp/$string.log", "w");
creates a file .../tmp/??????????????.log (in linux)
Found the reason of this strange situation!
In short words: if you see a proper encoded UTF-8 string on a server (in terminal) in unreadable symbols — check the server locale. And if you see a strange behavior of the mb_detect_encoding() method, don't forget that — mb_detect_encoding doesn't give you a precise encoding determination of a string.
The reason of not correct encoding in filename: .../tmp/??????????????.log file is the locale on the server! Here is the locale command result on the server where the file is located:
$ locale
LANG=
LC_CTYPE="C"
LC_COLLATE="C"
LC_TIME="C"
LC_NUMERIC="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=
For correct displaying UFT-8 symbols in file names on the server the server locale must be utf-8 too.
And about all the converting in the question. Both methods:
iconv('Windows-1251', 'UTF-8', $string);
and
mb_convert_encoding($string, 'UTF-8', 'Windows-1251');
works fine in this case.
The only question is why the second echo of
echo mb_detect_encoding($string,'CP1251,UTF-8,Windows-1251'); // echoes Windows-1251
$string = mb_convert_encoding($string, 'UTF-8', 'Windows-1251');
echo mb_detect_encoding($string,'CP1251,UTF-8,Windows-1251'); // again echoes Windows-1251
is not UTF-8?
And the answer is — mb_detect_encoding doesn't give you a precise encoding determination of a string