phpregex

Replace unicode whitespaces PHP


I'm trying to replace unicode whitespaces such as this characters and I was able to do that using this solution.. The problem with this solution is that it doesn't replace the unicode whitespaces IN BETWEEN normal characters..For example with this one using Thin Space

$string = "   test   string   ";
echo preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $string);
// outputs: test   string

I have a small understanding about RegEx so I don't know what to alter on my expression to resolve this issue


Solution

  • Such Unicode whitespaces how \u{2009} cause problems in various places. I would therefore replace all unicode spaces with regular spaces and then apply trim().

    $string = "   test   string and XY \t ";
    //\u{2009}\u{2009}\u{2009}test\u{2009}\u{2009}\u{2009}string\u{2009}and\x20XY\x20\x09\u{2009}
    
    $trimString = trim(preg_replace('/[\pZ\pC]/u', ' ', $string));
    //test\x20\x20\x20string\x20and\x20XY
    

    Note: The representation of the strings in the comment was made with debug::writeUni($string, $trimString); realized from this class.