I'm trying to parse a Word docx
file but when I use the:
xmlDocument.rootElement()?.elements(forName: "w:t")
I am getting an empty array returned.
Here's my relevant code:
func readDocXFile(at url: URL) {
do {
let fileManager = FileManager()
let tempDirectoryURL = fileManager.temporaryDirectory.appendingPathComponent(UUID().uuidString)
try fileManager.createDirectory(at: tempDirectoryURL, withIntermediateDirectories: true, attributes: nil)
try fileManager.unzipItem(at: url, to: tempDirectoryURL)
let documentXMLURL = tempDirectoryURL.appendingPathComponent("word/document.xml")
let xmlData = try Data(contentsOf: documentXMLURL)
let xmlDocument = try XMLDocument(data: xmlData)
let text = extractText(from: xmlDocument)
print(text)
// Clean up the temporary directory
try fileManager.removeItem(at: tempDirectoryURL)
} catch {
print("Failed to read DOCX file: \(error.localizedDescription)")
}
}
func extractText(from xmlDocument: XMLDocument) -> String {
let elements = xmlDocument.rootElement()?.elements(forName: "w:t") ?? []
print(xmlDocument.rootElement())
let text = elements.compactMap { $0.stringValue }.joined(separator: " ")
return text
}
The print of the rootElement shows that there are w:t elements in the XML:
...
<w:t xml:space="preserve"> Developer • Information Designer</w:t>
...
Any ideas where I am going wrong here?
elements(forName:)
only searches in the immediate children of the node. To find all <w:t>
nodes in the document, a convenient way is to use an XPath expression.
let elements = (try? xmlDocument.rootElement()?.nodes(
forXPath: "descendant::w:t"
)) ?? []
The descendant
axis searches for all the descendants of the node, which is what you want.