pythonregexparsingtelegramregex-group

The regular expression for parsing Telegram usernames stops parsing valid usernames if there is an invalid username in the same line


I have one problem here, I have a regexp that extracts the username in Telegram links, starting from a simple "@" to username.t.me links

image with problem in the regular expession

The problem is that if I enter @aaaa, @jfewewf, both usernames match correctly, but when I enter @aaaa, @jfewewf_, neither username matches, even though the script should match only username @aaaa (because the username on the right side is not valid)

Here is my regex:

(?:(?<!\S)@|(?:(?:https?://|)(?:t\.me|telegram\.(?:me|dog))/(?:c/|)|tg://resolve\?domain=)|(?=^(?!.*__)(?!.*_{2,})[a-z][a-z0-9_]{3,31}(?<!_)\.t\.me$))(?P<username>(?!.*__)(?!.*_$)(?!.*_{2,})[a-z][a-z0-9_]{3,31})(?P<subdomain>\.t\.me)?

You can test it at this link: https://regex101.com/r/JFF1S0/9

Please help me 🙏🙏🙏

I've already tried almost everything, I don't know how to solve it at all.


Solution

  • To do this, remove (?!.*_$) and add (?<!_)\b after [a-z][a-z0-9_]{3,31}

    Here is the updated regular expression:

    (?:(?<!\S)@|(?:(?:https?://|)(?:t\.me|telegram\.(?:me|dog))/(?:c/|)|tg://resolve\?domain=)|(?=^(?!.*__)(?!.*_{2,})[a-z][a-z0-9_]{3,31}(?<!_)\.t\.me$))(?P<username>(?!.*__)(?!.*_{2,})[a-z][a-z0-9_]{3,31}(?<!_)\b)(?P<subdomain>\.t\.me)?
    

    https://regex101.com/r/JFF1S0/10

    Original answer from Telegram chat:

    regex101 This checks to the end of the line, so it finds the "_" at the end and does not take the entire line [1]

    try https://regex101.com/r/C6FZER/1 [1]

    im remove you (?!.*_$) and add (?<!_)\b after [a-z][a-z0-9_]{3,31} [3]