phpencodingmultibytethaisoutheast-asian-languages

Manipulating Thai Characters in PHP


I'm struggling getting Thai characters and PHP working together. This is what I'd like to do:

<?php
   mb_internal_encoding('UTF-8');
   $string = "ทาง";
   echo $string[0];
?>

But instead of giving me the first character of $string (ท), I just get some messed up output. However, displaying $string itself works fine.

File itself is of course UTF-8 as well. Content-Type in Header is also set to UTF-8. I changed the neccessary lines in php.ini according to this site.

utf8_encoding() and utf8_decoding() also don't help. Maybe any of you has an idea?


Solution

  • In PHP When you access a string with $string[0] it doesn't return the fist character, but the first byte.

    You should use mb_substr instead. For example:

    mb_substr($string, 0, 1, 'UTF-8');
    

    Note: Since you are using mb_internal_encoding('UTF-8'); you may as well ignore the last parameter.


    This happens because PHP is not aware of the encoding a string is in (that is: the encoding is not stored in the string object). So it will treat it as ANSI/ASCII by default. If you don't want that, then you must use the Multibyte String Function (mb_*).

    When you set mb_internal_encoding('UTF-8'); you are telling it to use UTF-8 for all the Multibyte String Function, but not for anything else.