I have the following string which I want to 'clean' from multiple whitespaces:
$string = "This is a test string";
Not a big deal right? However, the string is not 'cleaned' after using:
$string = preg_replace('/\s+/', ' ', $string);
Because, when I output in ISO-8859-1, the string is like this:
$test = "This is a  test string";
So, how can I remove these characters?
You may use the /u
UNICODE modifier:
$string = preg_replace('/\s+/u', ' ', $string);
The
/u
modifier enables the PCRE engine to handle strings as UTF8 strings (by turning onPCRE_UTF8
verb) and make the shorthand character classes in the pattern Unicode aware (by enablingPCRE_UCP
verb)
The main point is that \s
will now match all Unicode whitespace and the input string is treated as a Unicode string.