I am using Office 2013 and I want to use regular expressions to modify all headings in a Word document by removing the part before the heading that consists of numbers, spaces, and punctuation. For example, "Heading 1.1.1.*.Title1.1.1" should become "*.Title1.1.1" after processing. I also need to keep the heading levels unchanged.
I have used the following program, but it is causing an infinite loop. I also tried putting the headings into a list and using Find with wildcards (<[0-9\s.,]*), but I did not get the correct results. What is wrong, and how should I fix it?
Here is the code:
Sub CleanTitlesUsingRegex()
Dim para As Paragraph
Dim title As String
Dim regex As Object
Set regex = CreateObject("VBScript.RegExp")
With regex
.Pattern = "^[\d\s.,]+"
.Global = True
End With
For Each para In ActiveDocument.Paragraphs
If para.Style Like "Heading*" Then
title = para.Range.Text
para.Range.Text = regex.Replace(title, "")
End If
Next para
End Sub
Sub CleanTitlesUsingFind()
Dim myfind As find
Set myfind = ActiveDocument.Content.find
Dim titleStyles As Variant
titleStyles = Array("Heading 1", "Heading 2", "Heading 3")
Dim styleName As Variant
For Each styleName In titleStyles
With myfind
.ClearFormatting
.text = "<[0-9\s.,、]*"
.Replacement.text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = True
.MatchSoundsLike = False
.MatchAllWordForms = False
.style = styleName
.Execute Replace:=wdReplaceAll
End With
Next
End Sub
The infinite loop results from the fact that you are replacing the entire range of the paragraph, including the paragraph mark. That seems to be enough to cause Word's For Each
to "get lost" so the code repeatedly processes the same paragraph.
This type of problem is common when working with the Word object model. A traditional solution is to avoid For Each
and process the paragraphs in reverse order, using something more like this:
Dim i As Long
For i = ActiveDocument.Paragraphs.Count To 1 Step -1
With ActiveDocument.Paragraphs(i)
' do your thing
End With
Next
Or here, you can probably modify the Range to exclude the paragraph mark and do something like this:
Dim rng As Object ' Word.Range
For Each para In ActiveDocument.Paragraphs
If para.Style Like "Heading*" Then
Set rng = para.Range
rng.End = rng.End - 1
title = rng.Text
rng.Text = regex.Replace(title, "")
End If
Next
Also, I'm not sure what you're actually trying to match because if you have Heading 1.1.1*Something 1.1.1
and you just want to remove the digits/./,/spaces immediately before the * then I think you need a Patterm nore like [\d\s.,]+\*
. (Although that would remove the *).
It may well be possible to do the same thing without VBScript.RegExp using a Word Wildcard replace, but I think you need to clarify exactly what you need to search for and replace. (Plus, I am sure there are other people here who are better at Regexp and Word Find/Replace patterns than me).