I need urgent help. I can't compare charset strings. A string written to a database table1 is utf-8 charset
but looks still strange: SADI
However a string written to table2 in the same database is SADI
which is normal.
whenever I compare both, it gives false.
Any idea how comparison can be made? (actually comparison should give true result)
Any idea how I can insert SADI as SADI
to a database.
Either will be a solution hopefully.
In your strings, SADI
is standard ASCII string, but SADI
is using full-width Unicode characters.
For example, S
is U+FF33 'FULLWIDTH LATIN CAPITAL LETTER S' (UTF-8: 0xEF 0xBC 0xB3
),
but S
is standard ASCII U+0053 'LATIN CAPITAL LETTER S' (UTF-8 0x53
).
Other characters are also similar extended Unicode characters, which look like standard Latin script, but in reality are not.
How did they get there - that's a good question. Probably somebody got really creative and copy-pasted something from Word? Who knows.
You can convert these strange characters back to normal ones by applying Unicode NFKC (Unicode Normalization Form KC) by using this Perl script as a filter (it accepts UTF-8 and outputs normalized UTF-8):
use Unicode::Normalize;
binmode STDIN, ':utf8';
binmode STDOUT, ':utf8';
while(<>) { print NFKC($_); }
In php:
$result = Normalizer::normalize( $str, Normalizer::FORM_KC );
Requires the intl extension