regexvb.net

Speed of VB.net Regex?


Is the regex code in VB.net known to be slow?

I took over some code that was cleaning large amounts of text data. The code ran fairly slow, so I was looking for some ways to speed it up. I found a couple functions that got run a lot that I thought might be part of the problem.

Here's the original code for cleaning a phone number:

        Dim strArray() As Char = strPhoneNum.ToCharArray
        Dim strNewPhone As String = ""
        Dim i As Integer

        For i = 0 To strArray.Length - 1
            If strArray.Length = 11 And strArray(0) = "1" And i = 0 Then
                Continue For
            End If

            If IsNumeric(strArray(i)) Then
                strNewPhone = strNewPhone & strArray(i)
            End If
        Next

        If Len(strNewPhone) = 7 Or Len(strNewPhone) = 10 Then
            Return strNewPhone
        End If

I rewrote the code to eliminate the array and looping using regex.

        Dim strNewPhone As String = ""
        strNewPhone = Regex.Replace(strPhoneNum, "\D", "")
        If strNewPhone = "" OrElse strNewPhone.Substring(0, 1) <> "1" Then
            Return strNewPhone
        Else
            strNewPhone = Mid(strNewPhone, 2)
        End If

        If Len(strNewPhone) = 7 Or Len(strNewPhone) = 10 Then
            Return strNewPhone
        End If

After running a couple tests, the new code is significantly slower than the old. Is regex in VB.net slow, did I add some other thing that is the issue, or is the original code just fine the way it was?


Solution

  • I conducted some tests with the Visual Studio Profiler and I did not get the same results you did. There was a logical error is your Regex function that caused the length check to be missed if the number didn't begin with 1. I corrected that in my tests.

    1. I realized in my tests, that whatever function went first and last would suffer a penalty. So I executed each function independently and had a priming function run before.
    2. Depending on the tests I executed the function either 10000 or 100000 times with a phone number like pattern of varying length. Each method got the same numbers.

    Results

    In general my method was always slightly faster.

    1. I did a cheap timer test, the Original function was twice as slow.
    2. Profiler showed the Original Method used about 60% more memory than our methods.
    3. Profiler showed the Original Method took eight times as long to work.
    4. Profiler showed the Original Method took about 40% more processor cycles.

    My Conclusion

    In all tests the Original method was much slower. Had it come out better in one test then I be able to explain our discrepancy. Ff you tested those methods in total isolation I think you will come up with something similar.

    My best guess is something else was effecting your results and that your assessment that the Original method was better is false.

    Your Revised Function

    Function GetPhoneNumberRegex(strPhoneNum As String)
        Dim strNewPhone As String = ""
        strNewPhone = Regex.Replace(strPhoneNum, "\D", "")
        If strNewPhone <> "" And strNewPhone.Substring(0, 1) = "1" Then
            strNewPhone = Mid(strNewPhone, 2)
        End If
    
        If Len(strNewPhone) = 7 Or Len(strNewPhone) = 10 Then
            Return strNewPhone
        End If
    
        Return ""
    End Function
    

    My Function

    Function GetPhoneNumberMine(strPhoneNum As String)
        Dim strNewPhone As String = Regex.Replace(strPhoneNum, "\D", "")
        If (strNewPhone.Length >= 7 And strNewPhone(0) = "1") Then
            strNewPhone = strNewPhone.Remove(0, 1)
        End If
    
        Return If(strNewPhone.Length = 7 OrElse strNewPhone.Length = 10, strNewPhone, "")
    End Function