xmlxml-parsingapache-nifihortonworks-dataflow

NiFi: get all elements of the same tag in XML using EvaluateXpath processor


Trying to parse the xml below in NiFi and would like to parse all the ids out and make multiple web service calls for each id.

<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <soap:Header>   
   </soap:Header>
   <soap:Body>
      <store-Ids>
            <Id>69E32281-0484</Id>
            <Id>3002AFCD-B494</Id>
            <Id>2C9E17AC-9D97</Id>
            <Id>98E8EB10-7D6A</Id>
            <Id>F8D5F93C-1455</Id>
            <Id>98655C3F-B58C</Id>
            <Id>8AE4FD0A-6000</Id>
            <Id>E56FE4CA-0D83</Id>
         </store-Ids>
   </soap:Body>
</soap:Envelope>

Is there a way to parse out all the id inside Id tags? Either as an array ( 69E32281-0484, 3002AFCD-B494.............) or as a string (69E32281-0484 3002AFCD-B4942C9E17AC-9D97...............) using the Evaluate-XPath or Evaluate-xQuery Processors?

//*[local-name()='Id']/text() -------- This gives me only the 1st id. and 
//*[local-name()='Id'][2]/text() ------- This gives the 2nd id and so on....
//Id -------------------------------- This returns "Empty string set"  

As the number of Ids are going to be dynamic. It is not possible to hard code the counter value like [0], [1], [2]........ to get the value of each id.

PS: There are many other ways to get this done in NiFi. But would like to know if there is a way to read XML with EvaluateXpath processor and get all the id tag values as an array or as a text.

Related links

1) https://community.hortonworks.com/questions/101922/how-to-use-evaluatexpath-to-get-xml-roots-attribut.html

2)https://community.hortonworks.com/questions/140605/evaluatexpath-cant-return-multiple-node-values.html


Solution

  • Currently EvaluateXPath only allows a single element in the Nodeset, even when the destination is flowfile-content. I have written up an improvement Jira (NIFI-5187) to cover the support for Nodesets with multiple elements.

    As a workaround, you can use EvaluateXQuery with //*/Id and it will issue a flow file for each of your IDs. Then you can process each individually, calling whichever web services you like.