.netregexvb.net

Visual Basic .net Regex match does not work - despite working on a testing tool?


I've a stupidly simple question for someone - but I can't answer it myself. I've a regex pattern that works in two different online testers, one of which is .net based.

Yet here it finds no matches. Can anyone help? The purpose is to filter a lovely page of F# cheats so that it is printable :).

I'm mentoring my youngest brother, he's on week 4 of learning to code - this is his function and I confess it's stumped me! Any help I'd be very grateful for!!

  Public Function FindCode(input As String)
    Dim pattern As String = "(?m)(<pre>)(.+)(<\/pre>)\B"
    Dim output As New Dictionary(Of Integer, String)
    Dim count As Integer

    For Each match As Match In Regex.Matches(input, pattern)
        output.Add(count, match.Value)
        count += 1
    Next
Return output.count
End Function

I don't get execptions, I get no matches.

An example would be

Some random markup <pre> and this stuff in the middle is what I'm after </pre> and there </pre> lots of these in one file </pre> which when I use Regexhero <pre> finds all the tags  </pre> 

This way we would use the groups perhaps to list all the items between the pre /pre tags.

Thanks for such quick responses!


Solution

  • First, I've tried the expression you've provided with Expresso and then in LinqPad - both returned the entire string which is not what you've intended to match. I see 2 issues why it is not showing the desired result:

    1. The regex expression itself
    2. A problem in the example string (the tags are not pairwise, i.e. each <pre> must be closed by </pre>)

    Besides that, I suggest some improvements to the code:

    1. Change the way you're matching (example below uses Regex options, and allows grouping)
    2. Add tagName as parameter, add parameter to allow inclusion or exclusion of the tags
    3. Return the collection instead of the count value

    Take a look at the code, it works fine (I've added some optional, commented out .Dump() statements for LinqPad in case you want to print out the values for debugging):

    Public Function FindCode(input As String, tagName as string, includeTags as boolean)
        Const grpName as string = "pregroup"
        Dim pattern As String = "(<"+tagName+">)(?<"+grpName+">(\s|\w|')+)(</"+tagName+">)"  
        Dim output As New Dictionary(Of Integer, String)
        Dim count As Integer
        
        Dim options as RegexOptions = RegexOptions.IgnoreCase _
              or RegexOptions.IgnorePatternWhitespace _
              or RegexOptions.MultiLine or RegexOptions.ExplicitCapture
        ' options.Dump("options")
        Dim rx as Regex = new Regex(pattern, options)
        For Each m As Match In rx.Matches(input)
            Dim val as string=nothing
            if (includeTags) 
                val = m.Value
            else
                if(m.Groups(grpName).Success)
                    val = m.Groups(grpName).Value 
                end if
            end if
            if not (val is nothing)
                ' val.Dump("Found #" & count+1)
                output.Add(count, val)
                count += 1
            end if
        Next    
        Return output
    End Function
    

    Regarding the expression:

    Regarding the Regex code: I have added the parameter includeTags so you can see the difference (false excludes them, true includes them). Note that you should always set the RegexOptions properly as it affects the way the expressions are matched.

    Finally, here's the main code:

    Sub Main
        dim input as string = "Some random markup <pre> and this stuff in the middle is what I'm after </pre> and there <pre> lots of these in one file </pre> which when I use Regexhero <pre> finds all the tags  </pre>"
        dim result = FindCode(input, "pre", false)
        dim count as integer = result.Count()
        Console.WriteLine(string.Format("Found string {0} times.", count))
        Console.WriteLine("Findings:")
        for each s in result
            Console.WriteLine(string.format("'{0}'", s.Value))
        next
    End Sub
    

    This will output:

    Found string 2 times.

    Findings:

    ' lots of these in one file '

    ' finds all the tags '

    However, there is still one question left: Why isn't the first <pre>...</pre> matched ? Take a look at the substring I'm after - it contains ' which isn't matched because it is neither a whitespace nor alphanumeric. You can add it by specifying (\s|\w|') in the regular expression, then it will show all 3 strings.