swiftxmldocx

Swift not finding XML elements


I'm trying to parse a Word docx file but when I use the:

xmlDocument.rootElement()?.elements(forName: "w:t")

I am getting an empty array returned.

Here's my relevant code:

func readDocXFile(at url: URL) {
        
        do {
            let fileManager = FileManager()
            let tempDirectoryURL = fileManager.temporaryDirectory.appendingPathComponent(UUID().uuidString)
            try fileManager.createDirectory(at: tempDirectoryURL, withIntermediateDirectories: true, attributes: nil)
            
            try fileManager.unzipItem(at: url, to: tempDirectoryURL)
            
            let documentXMLURL = tempDirectoryURL.appendingPathComponent("word/document.xml")
            let xmlData = try Data(contentsOf: documentXMLURL)
            let xmlDocument = try XMLDocument(data: xmlData)
            let text = extractText(from: xmlDocument)
            print(text)
            
            // Clean up the temporary directory
            try fileManager.removeItem(at: tempDirectoryURL)
            
        } catch {
            print("Failed to read DOCX file: \(error.localizedDescription)")
        }
    }
    
    func extractText(from xmlDocument: XMLDocument) -> String {
        let elements = xmlDocument.rootElement()?.elements(forName: "w:t") ?? []
        print(xmlDocument.rootElement())
        let text = elements.compactMap { $0.stringValue }.joined(separator: " ")
        return text
    }

The print of the rootElement shows that there are w:t elements in the XML:

...
<w:t xml:space="preserve">  Developer  •  Information Designer</w:t>
...

Any ideas where I am going wrong here?


Solution

  • elements(forName:) only searches in the immediate children of the node. To find all <w:t> nodes in the document, a convenient way is to use an XPath expression.

    let elements = (try? xmlDocument.rootElement()?.nodes(
        forXPath: "descendant::w:t"
    )) ?? []
    

    The descendant axis searches for all the descendants of the node, which is what you want.