pythonregexhashtag

Regex to match a condition UNLESS it is a hashtag


I am trying to write a regex statement to remove digits or words that contain digits in them only if they are not a hashtag. I am able to succesfully match words that have digits in them, but cannot seem to write a condition that ignores words that begin with a hashtag.

Here is a test string that I have been using to try and find a solution:

happening bit mediacon #2022ppopcon wearing stell naman today #sb19official 123 because h3llo also12 or 23old

I need a regex command that will capture the 123, h3llo, also12 and 23old but ignore the #2022ppopcon and #sb19official strings.

I have tried the following regex statements.

(#\w+\d+\w*)|(\w+\d+\w*) this succesfully captures the hashtags in group 1 and the non-hashtags in group 2, but I cannot figure out how to make it select group 2 only.

(?<!#)\w*\d+\w* this excludes the first character after the hashtag but still captures all the remaining characters in the hashtag string. for example in the string #2022ppopcan, it ignores #2 and captures 022ppopcan.


Solution

  • You might use

    (?<!\S)[^\W\d]*\d\w*
    

    See a regex demo.

    If you want to allow a partial match, you can use a negative lookbehind to not assert a # followed by a word boundary:

    (?<!#)\b[^\W\d]*\d\w*
    

    See another regex demo.