Does anyone know how when programmatically iterating through a word document, you can tell if a paragraph forms part of a table of contents (or indeed, anything else that forms part of a field).
My reason for asking is that I have a VB program that is supposed to extract the first couple of paragraphs of substantive text from a document - it's doing so by iterating through the Word.Paragraphs collection. I don't want the results to include tables of contents or other fields, I only want stuff that a human being would recognize as a header, title or a normal text paragraph. However it turns out that if there's a table of contents, then not only the table of contents itself but EVERY line in the table of contents appears as a separate item in Word.Paragraphs. I don't want these but haven't been able to find any property on the Paragraph object that would allow me to distinguish and so ignore them (I'm guessing I need the solution to apply to other field types too, like table of figures and table of authorities, which I haven't yet actually encountered but I guess potentially would cause the same problem)
Because of the limitations in the Word object model I think the best way to achieve this would be to temporarily remove the TOC field code, iterate through the Word document, and then re-insert the TOC. In VBA, it would look like this:
Dim doc As Document
Dim fld As Field
Dim rng As Range
Set doc = ActiveDocument
For Each fld In doc.Fields
If fld.Type = wdFieldTOC Then
fld.Select
Selection.Collapse
Set rng = Selection.Range 'capture place to re-insert TOC later
fld.Cut
End If
Next
Iterate through the code to extract paragraphs and then
Selection.Range = rng
Selection.Paste
If you are coding in .NET this should translate pretty closely. Also, this should work for Word 2003 and earlier as is, but for Word 2007/2010 the TOC, depending on how it is created, sometimes has a Content Control-like region surrounding it that may require you to write additional detect and remove code.