This is getting a little meta, but I'm trying to figure out a regex to match regexes for syntax highlighting purposes. There's a nice long backstory, but in the interest of brevity I'll skip it. Here's what I'm trying to do: I need to match a comment (preceded by #
and terminated at the end of the line) only if it is not inside a character class ([...]
), although it should be matched if there is a complete (closed) character class earlier in the line.
The complicating factor is escaped square brackets — while a plain [
earlier in the line not followed by a closing ]
would indicate that we're still in a character class, and therefore illegal, an escaped bracket \[
could be present, with or without the presence of a closing escaped bracket \]
.
Maybe some examples will help. Here are some instances where a comment should be matched:
(\h{8}-\h{4}-\h{4}-\h{4}-\h{12}) # match UUID
(no square brackets at all)([A-Za-z_][A-Za-z0-9_]*) # valid Python identifier
(paired unescaped square brackets)(\||\[|\?) # match some stuff
(escaped opening square bracket)Here is an example of where an "attempted comment" should not be matched:
[A-Za-z # letters
0-9_-.] # numbers and other characters
(the first line should not be matched, the second one is fine)
I'm by no means a regex master (which is why I'm asking this question!), but I have tried fiddling around with positive and negative lookbehinds, and trying to nest them, but I've had zero luck except with
(?<!\[)((#+).*$)
which matches a comment only if not preceded by an opening square bracket. Once I started nesting the lookarounds, though, and trying to match if the opener was preceded by an escape, I got stumped. Any help would be ... helpful.
It is rather simple, but in works with cases from your example. So try this:
(?<=[\][)]\s)(#(.*))$
it match comment only if preceded by closing bracket and space.
As I thought you case is much more complicated, so maybe try this one:
^(?=(?:[-\w\d?*.+|{}\\\/\s<>\]]|(?:\\[\[\]()]))+(#+.*)$)|^(?=^[\[(].+?[\])]\s*(#+.*)$)
It will match only by groups (it is not matching any text at all, as it use only positive lookahead, but grouping is lookarounds is allowed). Or if you want to match directly, match more text, and then get what you want with groups with something like:
^(?:(?:[-\w\d?*.+|{}\\\/\s<>\]])|(?:\\[\[\]()])|^[\[(].+?[\])])+\s*(#+.*)$
However in both cases, you probably would need to add more characters occuring in regular expressions to first alternative (?:[-\w\d?*.+|{}\\\/\s<>\]])
. For example, if you want it to match also comment in (\[ # works if escaped [ is in group
you need to add (
to alternative. But I am not sure is it what you wanted.
Try with:
^(?:(?:[-\w\d?*.+|{}\\\/\s<>\]\(])|(?:\\[\[\]()])|^[\[(].+?[\])])+\s*(?<valid>(?:#+).*)$|^[-\[\w\d?*.+|{}\\\/\s<>\(]+(?<invalid>(?:#+).*)$