javascriptxmlxpathopenxmlspreadsheetml

How to use XPath in JavaScript to select from namespaced OOXML?


I am building a tool to get data from a user specified XML file with an XPath expression. Simple XML files and an XML file with a single namespace works fine but I do not get the following XML file to work where the same prefix is overridden at different hierarchal levels:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
  <Author>Microsoft Office User</Author>
  <LastAuthor>Microsoft Office User</LastAuthor>
  <Created>2019-12-03T15:40:13Z</Created>
  <Version>16.00</Version>
 </DocumentProperties>
 <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
  <AllowPNG/>
 </OfficeDocumentSettings>
 <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
  <WindowHeight>13820</WindowHeight>
  <WindowWidth>23740</WindowWidth>
  <WindowTopX>1520</WindowTopX>
  <WindowTopY>1620</WindowTopY>
  <ProtectStructure>False</ProtectStructure>
  <ProtectWindows>False</ProtectWindows>
 </ExcelWorkbook>
 <Styles>
  <Style ss:ID="Default" ss:Name="Normal">
   <Alignment ss:Vertical="Bottom"/>
   <Borders/>
   <Font ss:FontName="Calibri" x:Family="Swiss" ss:Size="12" ss:Color="#000000"/>
   <Interior/>
   <NumberFormat/>
   <Protection/>
  </Style>
 </Styles>
 <Worksheet ss:Name="Blad1">
  <Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="8" x:FullColumns="1"
   x:FullRows="1" ss:DefaultColumnWidth="65" ss:DefaultRowHeight="16">
   <Row>
    <Cell><Data ss:Type="String">dfgdfgdfg</Data></Cell>
    <Cell><Data ss:Type="Number">1150</Data></Cell>
   </Row>
   <Row>
    <Cell><Data ss:Type="String">werwerwe</Data></Cell>
    <Cell><Data ss:Type="Number">889</Data></Cell>
   </Row>
  </Table>
  <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   <PageSetup>
    <Header x:Margin="0.3"/>
    <Footer x:Margin="0.3"/>
    <PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
   </PageSetup>
   <Selected/>
   <Panes>
    <Pane>
     <Number>3</Number>
     <RangeSelection>R1C1:R8C2</RangeSelection>
    </Pane>
   </Panes>
   <ProtectObjects>False</ProtectObjects>
   <ProtectScenarios>False</ProtectScenarios>
  </WorksheetOptions>
 </Worksheet>
</Workbook>

The xpath I am testing with is

/Workbook[@xmlns="urn:schemas-microsoft-com:office:spreadsheet"]/Worksheet[@ss:Name="Blad1"]/Table[@ss:ExpandedColumnCount="2"]/Row[1]/Cell[2]/Data[@ss:Type="Number"]/text()

and this is the code for parsing and searching for the node:

// downloadedData is a string with the xml, xPath is the string with the xpath

var parser, xmlDoc;

parser = new DOMParser();
xmlDoc = parser.parseFromString(downloadedData, "text/xml");

var xmlEvaluator = new XPathEvaluator();
var xmlResolver = xmlEvaluator.createNSResolver(xmlDoc);
var node = xmlEvaluator.evaluate(xPath, xmlDoc, xmlResolver, XPathResult.FIRST_ORDERED_NODE_TYPE, null);

/// node always empty here

edit: removed typo in code (unrelated to the real issue)


Solution

  • @MichaelKay is right in pointing out that namespaces are not attributes. Here are some supporting details for making the necessary adjustments so that your XPath will work...

    1. Fix an unrelated bug. Change

      xmlDoc = parser.parseFromString(downloadedData, ''), "text/xml");
      

      which is syntactically incorrect to

      xmlDoc = parser.parseFromString(downloadedData, 'text/xml');
      
    2. Observe that your XML already declares namespace prefixes for all needed components, so you won't have to create a custom nsResolver(); calling xmlEvaluator.createNSResolver() as you're doing will suffice.

    3. Fix your XPath to use namespace prefixes properly:

      xPath = '/ss:Workbook/ss:Worksheet[@ss:Name="Blad1"]/ss:Table[@ss:ExpandedColumnCount="2"]/ss:Row[1]/ss:Cell[2]/ss:Data[@ss:Type="Number"]/text()';
      

    Your code will now function properly, allowing XPath selection against your OOXML.

    See also How does XPath deal with XML namespaces?