javascriptregexmultilingualhashtag

Regex for matching HashTags in any language


I have a field in my application where users can enter a hashtag. I want to validate their entry and make sure they enter what would be a proper HashTag. It can be in any language and it should NOT precede with the # sign. I am writing in JavaScript.

So the following are GOOD examples:

And the following are BAD examples:

We had a regex that matched only a-zA-Z0-9, we needed to add language support so we changed it to ignore white spaces and forgot to ignore special characters, so here I am.

Some other StackOverflow examples I saw but didn't work for me:

  1. Other languges don't work
  2. Again, English only

[edit]


Solution

  • If your disallowed characters list is thorough (!@#$%^&*()=+./,[{]};:'"?><), then the regex is:

    ^#?[^\s!@#$%^&*()=+./,\[{\]};:'"?><]+$
    

    Demo

    This allows an optional leading # sign: #?. It disallows the special characters using a negative character class. I just added \s to the list (spaces), and also I escaped [ and ].

    Unfortunately, you can't use constructs like \p{P} (Unicode punctuation) in JavaScript's regexes, so you basically have to blacklist characters or take a different approach if the regex solution isn't good enough for your needs.