phppreg-replacecharacter-set

How to replace invisible characters (which are not actually spaces) with regex


I have the following string which I want to 'clean' from multiple whitespaces:

$string = "This is   a test string";

Not a big deal right? However, the string is not 'cleaned' after using:

$string = preg_replace('/\s+/', ' ', $string);

Because, when I output in ISO-8859-1, the string is like this:

$test = "This is a  test string";

So, how can I remove these characters?


Solution

  • You may use the /u UNICODE modifier:

    $string = preg_replace('/\s+/u', ' ', $string);
    

    The /u modifier enables the PCRE engine to handle strings as UTF8 strings (by turning on PCRE_UTF8 verb) and make the shorthand character classes in the pattern Unicode aware (by enabling PCRE_UCP verb)

    The main point is that \s will now match all Unicode whitespace and the input string is treated as a Unicode string.