I have a question about XML nodes.
I want to replace both of values and formulas in Excel file, so first, I changed it to zip file using file.rename()
, after, I work in XML file of zip file.
You can find XML in /xl/worksheets/sheet1.xml
, to replace, I write code lines:
library(XML)
### Change excel to zip
data_path <- "Untitled 1.xlsx"
file.rename(data_path, "Untitled 1.zip")
dir.create("Untitled 1")
utils::unzip("Untitled 1.zip", exdir = "Untitled 1")
### Read XML file
doc <- xmlTreeParse("Untitled 1/xl/worksheets/sheet1.xml", useInternal = TRUE)
#### Select the nodes we want to update
nodes <- getNodeSet(doc, "/worksheet")
nodes <- getNodeSet(doc, "/worksheet/sheetData")
# > list()
# > list()
When I get nodes, all of them return blank list nodes. It worked with getNodeSet(doc, "/")
, Xpath expression is /
, it return list having length = 1. I don't understand why that? What wrong is XML file? I have just tested in https://www.freeformatter.com/xpath-tester.html, it still return right result if Xpath expression is /worksheet
or /worksheet/sheetData
Can you explain the problem is? Thank you.
I attached the excel file, output zip file and XML file in here
Your document has a default namespace given in the root "worksheet" node. The XML
library needs the default namespaces to be named. You can do something like this
# name the namespaces
ns <- c(ns="http://schemas.openxmlformats.org/spreadsheetml/2006/main",
r="http://schemas.openxmlformats.org/officeDocument/2006/relationships")
# use the ns= namespace in the xpath
getNodeSet(doc, "/ns:worksheet", namespaces = ns)
getNodeSet(doc, "/ns:worksheet/ns:sheetData", namespaces = ns)
I know with xml2
, it's possible to strip the default namespace if you don't want to bother with it. You could do
library(xml2)
doc <- xml_ns_strip(read_xml("sheet1.xml"))
xml_find_all(doc, "/worksheet")
xml_find_all(doc, "/worksheet/sheetData")