I've a stupidly simple question for someone - but I can't answer it myself. I've a regex pattern that works in two different online testers, one of which is .net based.
Yet here it finds no matches. Can anyone help? The purpose is to filter a lovely page of F# cheats so that it is printable :).
I'm mentoring my youngest brother, he's on week 4 of learning to code - this is his function and I confess it's stumped me! Any help I'd be very grateful for!!
Public Function FindCode(input As String)
Dim pattern As String = "(?m)(<pre>)(.+)(<\/pre>)\B"
Dim output As New Dictionary(Of Integer, String)
Dim count As Integer
For Each match As Match In Regex.Matches(input, pattern)
output.Add(count, match.Value)
count += 1
Next
Return output.count
End Function
I don't get execptions, I get no matches.
An example would be
Some random markup <pre> and this stuff in the middle is what I'm after </pre> and there </pre> lots of these in one file </pre> which when I use Regexhero <pre> finds all the tags </pre>
This way we would use the groups perhaps to list all the items between the pre /pre tags.
Thanks for such quick responses!
First, I've tried the expression you've provided with Expresso and then in LinqPad - both returned the entire string which is not what you've intended to match. I see 2 issues why it is not showing the desired result:
<pre>
must be closed by </pre>
)Besides that, I suggest some improvements to the code:
Take a look at the code, it works fine (I've added some optional, commented out .Dump()
statements for LinqPad in case you want to print out the values for debugging):
Public Function FindCode(input As String, tagName as string, includeTags as boolean)
Const grpName as string = "pregroup"
Dim pattern As String = "(<"+tagName+">)(?<"+grpName+">(\s|\w|')+)(</"+tagName+">)"
Dim output As New Dictionary(Of Integer, String)
Dim count As Integer
Dim options as RegexOptions = RegexOptions.IgnoreCase _
or RegexOptions.IgnorePatternWhitespace _
or RegexOptions.MultiLine or RegexOptions.ExplicitCapture
' options.Dump("options")
Dim rx as Regex = new Regex(pattern, options)
For Each m As Match In rx.Matches(input)
Dim val as string=nothing
if (includeTags)
val = m.Value
else
if(m.Groups(grpName).Success)
val = m.Groups(grpName).Value
end if
end if
if not (val is nothing)
' val.Dump("Found #" & count+1)
output.Add(count, val)
count += 1
end if
Next
Return output
End Function
Regarding the expression:
(\s|\w)+
instead of .+
, because it includes only whitespaces and alphanumeric characters, not brackets and hence not the tags\xnn
(where nn is the hex code of the character) - note: this is not applicable hereRegarding the Regex
code: I have added the parameter includeTags
so you can see the difference (false
excludes them, true
includes them). Note that you should always set the RegexOptions properly as it affects the way the expressions are matched.
Finally, here's the main code:
Sub Main
dim input as string = "Some random markup <pre> and this stuff in the middle is what I'm after </pre> and there <pre> lots of these in one file </pre> which when I use Regexhero <pre> finds all the tags </pre>"
dim result = FindCode(input, "pre", false)
dim count as integer = result.Count()
Console.WriteLine(string.Format("Found string {0} times.", count))
Console.WriteLine("Findings:")
for each s in result
Console.WriteLine(string.format("'{0}'", s.Value))
next
End Sub
This will output:
Found string 2 times.
Findings:
' lots of these in one file '
' finds all the tags '
However, there is still one question left: Why isn't the first <pre>...</pre>
matched ?
Take a look at the substring I'm after
- it contains '
which isn't matched because it is neither a whitespace nor alphanumeric. You can add it by specifying (\s|\w|')
in the regular expression, then it will show all 3 strings.