vbams-wordfieldcodes

Distinguishing Table of Contents in Word document


Does anyone know how when programmatically iterating through a word document, you can tell if a paragraph forms part of a table of contents (or indeed, anything else that forms part of a field).

My reason for asking is that I have a VB program that is supposed to extract the first couple of paragraphs of substantive text from a document - it's doing so by iterating through the Word.Paragraphs collection. I don't want the results to include tables of contents or other fields, I only want stuff that a human being would recognize as a header, title or a normal text paragraph. However it turns out that if there's a table of contents, then not only the table of contents itself but EVERY line in the table of contents appears as a separate item in Word.Paragraphs. I don't want these but haven't been able to find any property on the Paragraph object that would allow me to distinguish and so ignore them (I'm guessing I need the solution to apply to other field types too, like table of figures and table of authorities, which I haven't yet actually encountered but I guess potentially would cause the same problem)


Solution

  • Because of the limitations in the Word object model I think the best way to achieve this would be to temporarily remove the TOC field code, iterate through the Word document, and then re-insert the TOC. In VBA, it would look like this:

    Dim doc As Document
    Dim fld As Field
    Dim rng As Range
    
    Set doc = ActiveDocument
    
    For Each fld In doc.Fields
        If fld.Type = wdFieldTOC Then
            fld.Select
            Selection.Collapse
            Set rng = Selection.Range 'capture place to re-insert TOC later
            fld.Cut
        End If
    Next
    

    Iterate through the code to extract paragraphs and then

    Selection.Range = rng
    Selection.Paste
    

    If you are coding in .NET this should translate pretty closely. Also, this should work for Word 2003 and earlier as is, but for Word 2007/2010 the TOC, depending on how it is created, sometimes has a Content Control-like region surrounding it that may require you to write additional detect and remove code.