regexvbams-accessstring-operationsinstr

Can I use regular expressions, the Like operator, and/or Instr() find the index of a pattern within a larger string?


I have a large list (a table with one field) of non-standardized strings, imported from a poorly managed legacy database. I need to extract the single-digit number (surrounded by spaces) that occurs exactly once in each of those strings (though the strings have other multi-digit numbers sometimes too). For example, from the following string:

"Quality Assurance File System And Records Retention Johnson, R.M. 004 4 2999 ss/ds/free ReviewMo = Aug Effective 1/31/2012 FileOpen-?"

I would want to pull the number 4 (or 4's position in the string, i.e. 71)

I can use

WHERE rsLegacyList.F1 LIKE "* # *" 

inside a select statement to find if each string has a lone digit, and thereby filter my list. But it doesn't tell me where the digit is so I can extract the digit itself (with mid() function) and start sorting the list. The goal is to create a second field with that digit by itself as a method of sorting the larger strings in the first field.

Is there a way to use Instr() along with regular expressions to find where a regular expression occurs within a larger string? Something like

intMarkerLocation = instr(rsLegacyList.F1, Like "* # *")

but that actually works?

I appreciate any suggestions, or workarounds that avoid the problem entirely.


@Lee Mac, I made a function RegExFindStringIndex as shown here:

Public Function RegExFindStringIndex(strToSearch As String, strPatternToMatch As String) As Integer

    Dim regex                       As RegExp
    Dim Matching                    As Match

    Set regex = New RegExp

    With regex
        .MultiLine = False
        .Global = True
        .IgnoreCase = False
        .Pattern = strPatternToMatch
        Matching = .Execute(strToSearch)
        RegExFindStringIndex = Matching.FirstIndex
    End With

    Set regex = Nothing
    Set Matching = Nothing
End Function

But it gives me an error Invalid use of property at line Matching = .Execute(strToSearch)


Solution

  • Using Regular Expressions

    If you were to use Regular Expressions, you would need to define a VBA function to instantiate a RegExp object, set the pattern property to something like \s\d\s (whitespace-digit-whitespace) and then invoke the Execute method to obtain a match (or matches), each of which will provide an index of the pattern within the string. If you want to pursue this route, here are some existing examples for Excel, but the RegExp manipulation will be identical in MS Access.

    Here is an example function demonstrating how to use the first result returned by the Execute method:

    Public Function RegexInStr(strStr As String, strPat As String) As Integer
        With New RegExp
            .Multiline = False
            .Global = True
            .IgnoreCase = False
            .Pattern = strPat
            With .Execute(strStr)
                If .Count > 0 Then RegexInStr = .Item(0).FirstIndex + 1
            End With
        End With
    End Function
    

    Note that the above uses early binding and so you will need to add a reference to the Microsoft VBScript Regular Expressions 5.5 library to your project.

    Example Immediate Window evaluation:

    ?InStr("abc 1 123", " 1 ")
     4 
    
    ?RegexInStr("abc 1 123", "\s\w\s")
     4 
    

    Using InStr

    An alternative using the in-built instr function within a query might be the following inelegant (and probably very slow) query:

    select
        switch
        (
            instr(rsLegacyList.F1," 0 ")>0,instr(rsLegacyList.F1," 0 ")+1,
            instr(rsLegacyList.F1," 1 ")>0,instr(rsLegacyList.F1," 1 ")+1,
            instr(rsLegacyList.F1," 2 ")>0,instr(rsLegacyList.F1," 2 ")+1,
            instr(rsLegacyList.F1," 3 ")>0,instr(rsLegacyList.F1," 3 ")+1,
            instr(rsLegacyList.F1," 4 ")>0,instr(rsLegacyList.F1," 4 ")+1,
            instr(rsLegacyList.F1," 5 ")>0,instr(rsLegacyList.F1," 5 ")+1,
            instr(rsLegacyList.F1," 6 ")>0,instr(rsLegacyList.F1," 6 ")+1,
            instr(rsLegacyList.F1," 7 ")>0,instr(rsLegacyList.F1," 7 ")+1,
            instr(rsLegacyList.F1," 8 ")>0,instr(rsLegacyList.F1," 8 ")+1,
            instr(rsLegacyList.F1," 9 ")>0,instr(rsLegacyList.F1," 9 ")+1,
            true, null
        ) as intMarkerLocation
    from
        rsLegacyList
    where 
        rsLegacyList.F1 like "* # *"