I had this problem today:
This regex matches only English: [a-zA-Z0-9]
.
If I need support for any language in this world, what regex should I write?
If you use character class shorthands and a Unicode aware regex engine you can do that. The \w
class matches "word characters" (letters, digits, and underscores).
Beware of some regex flavors that don't do this so well: JavaScript uses ASCII for \d
(digits) and \w
, but Unicode for \s
(whitespace). XML does it the other way around.