phpregexunicodestring-matching

How to match none-ascii letters in latin1_swedish_ci charset?


I have this string:Verbesserungsvorschläge which I think is in German.

Now I want to match it with a regex in PHP. To be more general, I want to match such characters like German which are not 100% in the ASCII set.


Solution

  • If you're working with an 8-bit character set, the regex [\x80-\xFF] matches any character that is not ASCII. In PHP that would be:

    if (preg_match('/[\x80-\xFF]/', $subject)) {
      # String has non-ASCII characters
    } else {
      # String is pure ASCII or empty
    }