phpstringsubstringcyrillicmultibyte

How to iterate a string containing cyrillic characters?


I am trying to iterate a string containing cyrillic characters and perform concatenation, but my code is returning mangled text.

Here's the code:

$str = "слово";
$temp = "";
for ($i = 0; $i < strlen($str); $i++) {
    $temp.=$str[$i];
    echo $temp . '<br>';
}
echo $temp;

Output:

�<br>с<br>с�<br>сл<br>сл�<br>сло<br>сло�<br>слов<br>слов�<br>слово<br>слово

Desired Output:

с<br>сл<br>сло<br>слов<br>слово<br>слово

I've also tried to use mb_strlen() instead of strlen() but this didn't work either.


Solution

  • You cannot simply use offset numbers to access multibyte characters.

    You need to use mb_strlen() AND mb_substr() to isolate your desired substrings.

    *note: caching $len is a good idea. mb_ functions are expensive; it is best to minimize the number of times you call them in a script.

    Code: (Demo)

    $str = "слово";
    $temp = "";
    for ($i = 0, $len = mb_strlen($str); $i < $len; $i++) {
        $temp .= mb_substr($str, $i, 1);
        echo $temp . '<br>';
    }
    echo $temp;
    

    Output:

    с<br>сл<br>сло<br>слов<br>слово<br>слово
    

    Depending on what your actual project needs are, here is an alternative that doesn't require a $temp variable:

    $str = "слово";
    for ($i = 0, $len = mb_strlen($str); $i < $len; $i++) {
        if ($i) echo '<br>';
        echo mb_substr($str, 0, $i + 1);
    }
    // с<br>сл<br>сло<br>слов<br>слово
    

    More simply, you can split the string into an array of individual letters and iterate that. Demo

    $str = "слово";
    $temp = "";
    foreach (mb_str_split($str) as $char) {
        $temp .= $char;
        echo $temp . '<br>';
    }
    echo $temp;