I've been using regex for years, I've read several tutorials and references (emacs regex reference is my bible), but I still have problems understanding matching. Is there a good comprehensive tutorial on regex matching with abundant examples? Can anybody give me a link where I can finally deeply understand regex matching?
Example ot the problem bothering me.
haystack = "[{one, {one, andahalf}},\n {{two, zero}, two},\n {{threezero}, three},\n {four}]"
pattern = "({.+})"
Result is:
{one, {one, andahalf}}
{{two, zero}, two}
{{threezero}, three}
{four}
Now, what is that exactly? Greedy or nongreedy (it's C# Regexp.Matches)?
Why, o why the (nongreedy) result isn't:
{one, {one, andahalf}
{{two, zero}
{{threezero}
{four}
(matching first possible pair of {})
Or (greedy):
{one, {one, andahalf}},\n {{two, zero}, two},\n {{threezero}, three},\n {four}
(maching greatest possible pair of {})
Of course, the actual result is exactly what I need, and I'm very happy that regex reads my mind, but I'd rather that I read his mind :-D So, does anybody have any decent tutorial on regex matching which will help me understand how this match did what it did?
The reason this happened to work is that those patterns are separated by newlines, and by default the dot (in the .+
part of your regex) matches anything but a newline. To change that behaviour, compile the regex with RegexOptions.Singleline
set.
So it's just a coincidence that the braces were correctly balanced during this greedy match.
A good regex tutorial can be found at http://www.regular-expressions.info.
By the way, for safety, braces should always be escaped (\{
, \}
). The .NET regex engine happens to recognize that they can't mean a quantifier in this context, but other engines will fail to compile this regex.