infosphere-splibm-streams

XML Parse operator throws error when working with large XML file in IBM Streams


XML Parse operator throws this error while working with large XML files: The following error occurred during XML parsing: internal error: Huge input lookup

While documentation says this has been fixed in Streams 4.2.1.3 where we can add this parameter to XML Parse operator to fix it: xmlParseHuge: true;

The above parameter is not supported in lower versions of Streams. How do I fix this in Streams 4.2.1.1?


Solution

  • There was not better way to do this is in Streams 4.2.1.1 I finally decided to use topology toolkit to make a Python operator. XML tuples were passed through this operator and xml.etree.ElementTree library was used to parse the XML, extract required data and return back the tuple type.