phpstringunicodeutf-8truncate

Truncate a UTF-8 string to fit a given byte count in PHP


Say we have a UTF-8 string $s and we need to shorten it so it can be stored in N bytes. Blindly truncating it to N bytes could mess it up. But decoding it to find the character boundaries is a drag. Is there a tidy way?

[Edit 20100414] In addition to S.Mark’s answer: mb_strcut(), I recently found another function to do the job: grapheme_extract($s, $n, GRAPHEME_EXTR_MAXBYTES); from the intl extension. Since intl is an ICU wrapper, I have a lot of confidence in it.


Solution

  • I think you don't need to reinvent the wheel, you could just use mb_strcut and make sure you set encoding to UTF-8 first.

    mb_internal_encoding('UTF-8');
    echo mb_strcut("\xc2\x80\xc2\x80", 0, 3); //from index 0, cut to 3 bytes.
    

    its return

    \xc2\x80
    

    because in \xc2\x80\xc2, last one is invalid