javascriptregexregex-lookaroundsnegative-lookbehindword-boundary

Negative Lookahead & Lookbehind with Capture Groups and Word Boundaries


We are auto-formatting hyperlinks in a message composer but would like to avoid matching links that are already formatted.

Attempt: Build a regex that uses a negative lookbehind and negative lookahead to exclude matches where the link is surrounded by href=" and ".

Problem: Negative lookbehind/lookahead are not working with our regex:

Regex:

/(?<!href=")(http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9@:%._+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_+.~#?&\/\/=;]*)(?!")/g

Usage:

html.match(/(?<!")(http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=;]*)(?!")/g);

When testing, we notice that exchanging the negative lookahead/lookbehind with a positive version causes it to work. Thus, only negative lookbehind/lookaheads are not working.

Does anyone know why these negative lookbehind/lookaheads are not functioning with this regex?

Thank you!


Solution

  • With @Barmar's help in the question comments, it is clear that the problem lies in the optional beginning and end of the regex.

    "Basically, anything that allows something to be optional next to a negative lookaround may negate the effect of the lookaround, if it can find a shorter match that isn't next to it. "