phpcharacter-encodingtruncatesubstrucfirst

Truncating Chinese text


Our website is in Chinese and a part of the main page shows a list of other page titles at a maximum length of what works out as being called '26' (I assume this is using the English character count if the Chinese characters were written using English?). The line we use for this is:

<?php echo anchor('projects/'.$rs->url_project_title.'/'.$rs->project_id,substr(ucfirst($rs->project_title),0,26),'style="text-decoration:none;"'); ?>

However, if the title is indeed to long the code truncates it as it should but the final two Chinese characters are always shown as �� as I'm guessing it's using the English version of the words and splitting a Chinese character (somehow). Maybe I'm over thinking this!?

For example....

Original:
在国内做一个尊重艺术,能够为青年导演提供平

Truncated version:
在国内做一个尊重��

Can you perhaps suggest a modification to enable the desired number of characters show without resulting in the ��'s?


Solution

  • Instead of substr use mbstring functions:

    echo anchor(
        'projects/' . $rs->url_project_title . '/' . $rs->project_id,
        mb_substr(ucfirst($rs->project_title), 0, 26), 
        'style="text-decoration:none;"'
    );
    

    If You are not successful with this, then it is possible that PHP didn't detect the string encoding and therefore please provide the right encoding to the mb_substr():

    // PHP uses internal encoding mb_internal_encoding()
    echo mb_substr($string, 0, 26);
    // you specify the encoding - in the case you know in which encoding the input comes
    echo mb_substr($string, 0, 26, 'UTF-8');
    // PHP tries to detect the encoding
    echo mb_substr($string, 0, 26, mb_detect_encoding($string));
    

    See mb_detect_encoding() as well for further information.

    Hope this helps.