regexvbareplacems-wordpci-compliance

Microsoft Word 2010 VBA: Using regex to find and replace


I'm trying to allow our staff to quickly mask the middle 8 digits of credit card data in old documents. I figure a macro using regex to do a find and replace is the fastest way for them. Masking is a PCI DSS requirement, and there may be hundreds or thousands of documents to mask the data in.

I've had some luck with the below code, however it will identify and modify strings that are beyond the 16 characters for a standard credit card and I'm not sure how to stop the false-positives. SOLVED by nhahtdh

The following identifies Visa, MasterCard and AmEx cards, with the above mentioned issue. However it could be improved by adding the regex for more card types, and adding common characters used to break up the long number.

The below code works, but could be improved. Can anyone help improve this by:

    Sub PCI_mask_card_numbers()
'
' This macro will search a document for numbers that look like Visa, MasterCard and AmEx credit card PANs and mask them with Xs
'
   Dim Counter As Long
   Dim Preexisting As Long

' Let the user know what's about to happen
    Dim Msg, Style, Title, Response, MyString
    Msg = "The macro will now attempt to mask all the credit card numbers it can identify.  e.g. 4444555566667777 will become 4444xxxxxxxx7777"
    Style = vbInformation
    Title = "PCI DSS - Credit Card Masking"
    Response = MsgBox(Msg, Style, Title)

' Count how many things already look like masked PANs so the final tally is correct
    Selection.HomeKey Unit:=wdStory
    With ActiveDocument.Content.Find
        Do While .Execute(FindText:="xxxx", Forward:=True, Format:=True, _
           MatchWholeWord:=True) = True
           Preexisting = Preexisting + 1
        Loop
    End With
    Preexisting = Preexisting / 2   ' because masks with a break were counted twice
    Selection.HomeKey Unit:=wdStory
    With ActiveDocument.Content.Find
        Do While .Execute(FindText:="xxxxxxxx", Forward:=True, Format:=True, _
           MatchWholeWord:=False) = True
           Preexisting = Preexisting + 1
        Loop
    End With

' ########  Start masking PANs  ###################################################

' Mastercard - 16 digits straight
    Selection.HomeKey Unit:=wdStory
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = "<([4][0-9]{3})([0-9]{4})([0-9]{4})([0-9]{4})>"
        .Replacement.Text = "\1xxxxxxxx\4"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll

' Visa - 16 digits straight
    With Selection.Find
        .Text = "<([5][0-9]{3})([0-9]{4})([0-9]{4})([0-9]{4})>"
        .Replacement.Text = "\1xxxxxxxx\4"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll

' AmEx - 15 digits straight
    With Selection.Find
        .Text = "<([3][0-9]{2})([0-9]{4})([0-9]{4})([0-9]{4})>"
        .Replacement.Text = "\1xxxxxxxx\4"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll

' Visa and Mastercard - PAN broken up by :
    With Selection.Find
        .Text = "<([4][0-9]{3})(:[0-9]{4}:[0-9]{4}:)([0-9]{4})>"
        .Replacement.Text = "\1:xxxx:xxxx:\3"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    With Selection.Find
        .Text = "<([5][0-9]{3})(:[0-9]{4}:[0-9]{4}:)([0-9]{4})>"
        .Replacement.Text = "\1:xxxx:xxxx:\3"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll

' Visa and Mastercard - PAN broken up by .
    With Selection.Find
        .Text = "<([5][0-9]{3})(.[0-9]{4}.[0-9]{4}.)([0-9]{4})>"
        .Replacement.Text = "\1.xxxx.xxxx.\3"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    With Selection.Find
        .Text = "<([4][0-9]{3})(.[0-9]{4}.[0-9]{4}.)([0-9]{4})>"
        .Replacement.Text = "\1.xxxx.xxxx.\3"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll

' Visa and Mastercard - PAN broken up by spaces
    With Selection.Find
        .Text = "<([4][0-9]{3})( [0-9]{4} [0-9]{4} )([0-9]{4})>"
        .Replacement.Text = "\1 xxxx xxxx \3"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    With Selection.Find
        .Text = "<([5][0-9]{3})( [0-9]{4} [0-9]{4} )([0-9]{4})>"
        .Replacement.Text = "\1 xxxx xxxx \3"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll

' Visa and Mastercard - PAN broken up by -
    With Selection.Find
        .Text = "<([5][0-9]{3})(-[0-9]{4}-[0-9]{4}-)([0-9]{4})>"
        .Replacement.Text = "\1-xxxx-xxxx-\3"
        .Forward = True
        .Wrap = wdFindAsk
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    With Selection.Find
        .Text = "<([4][0-9]{3})(-[0-9]{4}-[0-9]{4}-)([0-9]{4})>"
        .Replacement.Text = "\1-xxxx-xxxx-\3"
        .Forward = True
        .Wrap = wdFindAsk
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    Selection.HomeKey Unit:=wdStory

' ########  Done masking PANs  ###################################################

' Count how many changes were done
    Selection.HomeKey Unit:=wdStory
    With ActiveDocument.Content.Find
        Do While .Execute(FindText:="xxxx", Forward:=True, Format:=True, _
           MatchWholeWord:=True) = True
           Counter = Counter + 1
        Loop
    End With
    Counter = Counter / 2   ' because masks with a break were counted twice
    Selection.HomeKey Unit:=wdStory
    With ActiveDocument.Content.Find
        Do While .Execute(FindText:="xxxxxxxx", Forward:=True, Format:=True, _
           MatchWholeWord:=False) = True
           Counter = Counter + 1
        Loop
    End With
    Counter = Counter – Preexisting   ' New masks less previous mask-like data

' Let the user know the job is done
    Msg = "The macro has masked " & Str$(Counter) & " credit cards. Check the results and save the file if the changes are correct. If there are issues with the masking changes, do not save the file and consult the IT team."
    Style = vbInformation
    Title = "PCI DSS - Credit Card Masking." & Str$(Counter) & " cards masked"
    Response = MsgBox(Msg, Style, Title)
End Sub

Solution

  • Since it seems like you are using the Word wildcard syntax, you probably can use <, which asserts beginning of word, and >, which asserts end of word to prevent the pattern from matching when the text is preceded or succeeded by letters or numbers (which is how it seems to work from some simple testing).

    Using

    "([4][0-9]{3})(-[0-9]{4}-[0-9]{4}-)([0-9]{4})"
    

    as example, modify it into

    "<([4][0-9]{3})(-[0-9]{4}-[0-9]{4}-)([0-9]{4})>"