javascriptregexbackreferenceregex-alternation

Match altered version of first match with only one expression?


I'm writing a brush for Alex Gorbatchev's Syntax Highlighter to get highlighting for Smalltalk code. Now, consider the following Smalltalk code:

aCollection do: [ :each | each shout ]

I want to find the block argument ":each" and then match "each" every time it occurrs afterwards (for simplicity, let's say every occurrence an not just inside the brackets). Note that the argument can have any name, e.g. ":myArg".

My attempt to match ":each":

\:([\d\w]+)

This seems to work. The problem is for me to match the occurrences of "each". I thought something like this could work:

\:([\d\w]+)|\1

But the right hand side of the alternation seems to be treated as an independent expression, so backreferencing doesn't work.

Is it even possible to accomplish what I want in a single expression? Or would I have to use the backreference within a second expression (via another function call)?


Solution

  • You could do it in languages that support variable-length lookbehind (AFAIK only the .NET framework languages do, Perl 6 might). There you could highlight a word if it matches (?<=:(\w+)\b.*)\1. But JavaScript doesn't support lookbehind at all.

    But anyway this regex would be very inefficient (I just checked a simple example in RegexBuddy, and the regex engine needs over 60 steps for nearly every character in the document to decide between match and non-match), so this is not a good idea if you want to use it for code highlighting.

    I'd recommend you use the two-step approach you mentioned: First match :(\w+)\b (word boundary inserted for safety, \d is implied in \w), then do a literal search for match result \1.