I need to process an XML file which has a namespace declaration on its root element and containing +133K sub elements, its size is around 500MB; to achieve this i'm using WSO2 ESB 5 and smooks mediator.
Basically what i'm looking for is to split the input file into little chunks with a predefined structure and send each of them to a queue for later processing.
I tried first to do an XSLT transformation first to remove the namespace from the input file but i got an OutOfMemory error like this:
TID: [-1234] [] [2017-03-02 03:04:43,900] ERROR {org.apache.axis2.transport.base.threads.NativeWorkerPool} - Uncaught exception {org.apache.axis2.transport.base.threads.NativeWorkerPool}
java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.axiom.om.impl.llom.factory.OMLinkedListImplFactory.createOMText(OMLinkedListImplFactory.java:192)
at org.apache.axiom.om.impl.builder.StAXBuilder.createOMText(StAXBuilder.java:294)
at org.apache.axiom.om.impl.builder.StAXBuilder.createOMText(StAXBuilder.java:250)
at org.apache.axiom.om.impl.builder.StAXOMBuilder.next(StAXOMBuilder.java:252)
at org.apache.axiom.om.impl.llom.OMSerializableImpl.build(OMSerializableImpl.java:78)
at org.apache.axiom.om.impl.llom.OMElementImpl.build(OMElementImpl.java:722)
at org.apache.axiom.om.impl.llom.OMElementImpl.detach(OMElementImpl.java:700)
at org.apache.axiom.om.impl.llom.OMNodeImpl.setParent(OMNodeImpl.java:105)
at org.apache.axiom.om.impl.llom.OMNodeImpl.insertSiblingAfter(OMNodeImpl.java:203)
at org.apache.synapse.mediators.transform.XSLTMediator.performXSLT(XSLTMediator.java:366)
at org.apache.synapse.mediators.transform.XSLTMediator.mediate(XSLTMediator.java:202)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:97)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:59)
at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:158)
at org.apache.synapse.core.axis2.ProxyServiceMessageReceiver.receive(ProxyServiceMessageReceiver.java:210)
at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
at org.apache.axis2.transport.base.AbstractTransportListener.handleIncomingMessage(AbstractTransportListener.java:328)
at org.apache.synapse.transport.vfs.VFSTransportListener.processFile(VFSTransportListener.java:824)
at org.apache.synapse.transport.vfs.VFSTransportListener.scanFileOrDirectory(VFSTransportListener.java:472)
at org.apache.synapse.transport.vfs.VFSTransportListener.poll(VFSTransportListener.java:188)
at org.apache.synapse.transport.vfs.VFSTransportListener.poll(VFSTransportListener.java:134)
at org.apache.axis2.transport.base.AbstractPollingTransportListener$1$1.run(AbstractPollingTransportListener.java:67)
at org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(NativeWorkerPool.java:172)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I did not understand why this is happening because my virtual machine is configured to work with -Xms4096m -Xmx6144m
Based on the previous error i decided to implement kind of streaming solution using smooks, then i defined a vfs proxy service to poll a folder and give the file to smook mediator but i keep getting an error that seems to be related to the namespace definition on the root element of the input file and i mention this because whenever i edit the input file and get rid of the namespace definition what i have defined and deployed on WSO2 ESB works perfectly. The point here is i'm receiving the large file from a backend black box system and i should deal with the namespace stuff.
The following are the definitions i have on my ESB:
Proxy Service
<?xml version="1.0" encoding="UTF-8"?>
<proxy xmlns="http://ws.apache.org/ns/synapse"
name="Tryzens_ProductProxy"
startOnLoad="true"
statistics="disable"
trace="disable"
transports="vfs">
<target>
<inSequence>
<log level="custom">
<property name="Tryzens_ProductProxy__tracing" value="before smooks"/>
</log>
<property name="DISABLE_SMOOKS_RESULT_PAYLOAD" value="true"/>
<smooks config-key="ProductSplitJMS_Smook">
<input type="xml"/>
<output type="xml"/>
</smooks>
<log level="custom">
<property name="Tryzens_ProductProxy__tracing" value="after smooks"/>
</log>
</inSequence>
</target>
<parameter name="transport.vfs.Streaming">true</parameter>
<parameter name="transport.PollInterval">15</parameter>
<parameter name="transport.vfs.ActionAfterProcess">MOVE</parameter>
<parameter name="transport.vfs.FileURI">vfs:file:///home/jairof/wso2/00_test/working/tryzens/smook_product/</parameter>
<parameter name="transport.vfs.MoveAfterProcess">vfs:file:///home/jairof/wso2/00_test/working/tryzens/output/</parameter>
<parameter name="transport.vfs.MoveAfterFailure">vfs:file:///home/jairof/wso2/00_test/working/tryzens/fails/</parameter>
<parameter name="transport.vfs.FileNamePattern">.*.xml</parameter>
<parameter name="transport.vfs.ContentType">application/xml</parameter>
<parameter name="transport.vfs.ActionAfterFailure">MOVE</parameter>
<description/>
</proxy>
Smooks configuration
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd" xmlns:xsl="http://www.milyn.org/xsd/smooks/xsl-1.1.xsd" xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd" xmlns:jms="http://www.milyn.org/xsd/smooks/jms-routing-1.2.xsd">
<params>
<param name="stream.filter.type">SAX</param>
<param name="default.serialization.on">false</param>
</params>
<resource-config selector="product">
<resource>org.milyn.delivery.DomModelCreator</resource>
</resource-config>
<jms:router routeOnElement="product" beanId="productItem_xml" destination="dynamicQueues/TestFL">
<jms:connection factory="QueueConnectionFactory"/>
<jms:jndi contextFactory="org.apache.activemq.jndi.ActiveMQInitialContextFactory" providerUrl="tcp://localhost:61616"/>
<jms:highWaterMark mark="-1"/>
</jms:router>
<ftl:freemarker applyOnElement="product">
<ftl:template>/repository/resources/smooks/product.ftl</ftl:template>
<ftl:use>
<ftl:bindTo id="productItem_xml"/>
</ftl:use>
</ftl:freemarker>
</smooks-resource-list>
Smooks template
This template is only for testing purposes, the real one corresponds to the complete structure of the product element, but to reproduce the error situation it is enough:
<#ftl ns_prefixes={"ns1": "http://www.demandware.com/xml/impex/catalog/2006-10-31"}>
<product id='${.vars["product"]["@product-id"]}'>
<ean>${product.ean}</ean>
</product>
Sample input file
Note that the actual file has more than 133K products, in this sample I cut most part of the file and left only two products
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://www.demandware.com/xml/impex/catalog/2006-10-31" catalog-id="tml-catalog-en">
<header>
<image-settings>
<internal-location base-path="/images"/>
<view-types>
<view-type>original</view-type>
<view-type>portrait</view-type>
<view-type>badge_GBP</view-type>
<view-type>badge_EUR</view-type>
<view-type>badge_USD</view-type>
<view-type>badge_AUD</view-type>
<view-type>badge_CZH</view-type>
<view-type>ctlimage</view-type>
<view-type>badge_FRA</view-type>
<view-type>badge_GER</view-type>
<view-type>landscape</view-type>
</view-types>
<alt-pattern>${productname}, ${variationvalue}, ${viewtype}</alt-pattern>
<title-pattern>${productname}, ${variationvalue}</title-pattern>
</image-settings>
</header>
<category category-id="MensShoes">
<display-name xml:lang="de-DE">Schuhe</display-name>
<display-name xml:lang="x-default">Shoes</display-name>
<display-name xml:lang="fr-FR">Chaussures</display-name>
<online-flag>true</online-flag>
<parent>MENSWEAR</parent>
<position>12.0</position>
<image>images/slot/landing/men_menlanding_H1_GBP.jpg</image>
<template/>
<page-attributes/>
<custom-attributes>
<custom-attribute attribute-id="categoryRecommendationsEnable">false</custom-attribute>
<custom-attribute attribute-id="enableCompare">false</custom-attribute>
<custom-attribute attribute-id="enableGridItemButtonStrip">false</custom-attribute>
<custom-attribute attribute-id="enableGridItemMobileButtonStrip">false</custom-attribute>
<custom-attribute attribute-id="enableUserJourney">false</custom-attribute>
<custom-attribute attribute-id="enableWishlist">false</custom-attribute>
<custom-attribute attribute-id="fitsme_enabled">false</custom-attribute>
<custom-attribute attribute-id="rrGenere">false</custom-attribute>
<custom-attribute attribute-id="rsCategoryEnabled">false</custom-attribute>
<custom-attribute attribute-id="shopAllButton">false</custom-attribute>
<custom-attribute attribute-id="showInMenu">true</custom-attribute>
<custom-attribute attribute-id="showInMobileMenu">false</custom-attribute>
<custom-attribute attribute-id="show_alternate_image_on_plp">false</custom-attribute>
<custom-attribute attribute-id="slotBannerImage">images/slot/landing/men_menlanding_H1_GBP.jpg</custom-attribute>
</custom-attributes>
</category>
<category category-id="P50 SUIT">
<display-name xml:lang="de-DE">Hosen</display-name>
<display-name xml:lang="x-default">Trousers</display-name>
<display-name xml:lang="fr-FR">Pantalons</display-name>
<online-flag>true</online-flag>
<parent>WomensTailoring</parent>
<position>0.0</position>
<template/>
<page-attributes/>
</category>
<product product-id="0">
<ean/>
<upc/>
<unit/>
<min-order-quantity>1</min-order-quantity>
<step-quantity>1</step-quantity>
<store-force-price-flag>false</store-force-price-flag>
<store-non-inventory-flag>false</store-non-inventory-flag>
<store-non-revenue-flag>false</store-non-revenue-flag>
<store-non-discountable-flag>false</store-non-discountable-flag>
<online-flag>false</online-flag>
<available-flag>true</available-flag>
<searchable-flag>true</searchable-flag>
<images>
<image-group view-type="badge_EUR">
<image path="badge/blank.png"/>
</image-group>
<image-group view-type="badge_GBP">
<image path="badge/blank.png"/>
</image-group>
<image-group view-type="badge_GER">
<image path="badge/blank.png"/>
</image-group>
<image-group view-type="badge_USD">
<image path="badge/blank.png"/>
</image-group>
</images>
<page-attributes/>
<pinterest-enabled-flag>false</pinterest-enabled-flag>
<facebook-enabled-flag>false</facebook-enabled-flag>
<store-attributes>
<force-price-flag>false</force-price-flag>
<non-inventory-flag>false</non-inventory-flag>
<non-revenue-flag>false</non-revenue-flag>
<non-discountable-flag>false</non-discountable-flag>
</store-attributes>
</product>
<product product-id="12024">
<ean/>
<upc/>
<unit/>
<min-order-quantity>1</min-order-quantity>
<step-quantity>1</step-quantity>
<store-force-price-flag>false</store-force-price-flag>
<store-non-inventory-flag>false</store-non-inventory-flag>
<store-non-revenue-flag>false</store-non-revenue-flag>
<store-non-discountable-flag>false</store-non-discountable-flag>
<online-flag>false</online-flag>
<available-flag>true</available-flag>
<searchable-flag>true</searchable-flag>
<images>
<image-group view-type="original">
<image path="original/12024_original_original.jpg"/>
</image-group>
</images>
<brand>J FRANCOMB</brand>
<page-attributes/>
<custom-attributes>
<custom-attribute attribute-id="allocGroup">X</custom-attribute>
<custom-attribute attribute-id="colour">
<value>3PNK-PINK</value>
</custom-attribute>
<custom-attribute attribute-id="cuffType">
<value>SINGLE CUFF</value>
</custom-attribute>
<custom-attribute attribute-id="enable_pdp_asset_footer_layout">false</custom-attribute>
<custom-attribute attribute-id="fabric">
<value>LEWIN 100 PD</value>
</custom-attribute>
<custom-attribute attribute-id="fit">SEMI FIT</custom-attribute>
<custom-attribute attribute-id="gender">
<value>M</value>
</custom-attribute>
<custom-attribute attribute-id="look">PTRN447</custom-attribute>
<custom-attribute attribute-id="pattern">
<value>PATTERN</value>
</custom-attribute>
<custom-attribute attribute-id="productIDCIMS">12024</custom-attribute>
<custom-attribute attribute-id="retailTypeCIMS">M FORMAL</custom-attribute>
<custom-attribute attribute-id="seasonCIMS">307B</custom-attribute>
<custom-attribute attribute-id="styleName">MILSC PATTERN DOOM AND BLOOM</custom-attribute>
<custom-attribute attribute-id="styleNameCIMS">MILSC PATTERN DOOM AND BLOOM</custom-attribute>
<custom-attribute attribute-id="styleNumberCIMS">MS17</custom-attribute>
<custom-attribute attribute-id="typeDesc">MS SHIRTS</custom-attribute>
<custom-attribute attribute-id="weight">0.3</custom-attribute>
</custom-attributes>
<options>
<shared-option option-id="sleeveLengthAlteration"/>
<shared-option option-id="giftBox"/>
</options>
<variations>
<attributes>
<shared-variation-attribute attribute-id="collarSize" variation-attribute-id="collarSize"/>
<shared-variation-attribute attribute-id="sleeveLength" variation-attribute-id="sleeveLength"/>
</attributes>
</variations>
<classification-category>S17 MILAN</classification-category>
<pinterest-enabled-flag>false</pinterest-enabled-flag>
<facebook-enabled-flag>false</facebook-enabled-flag>
<store-attributes>
<force-price-flag>false</force-price-flag>
<non-inventory-flag>false</non-inventory-flag>
<non-revenue-flag>false</non-revenue-flag>
<non-discountable-flag>false</non-discountable-flag>
</store-attributes>
</product>
<category-assignment category-id="T43 HERITAGE" product-id="505158991125">
<primary-flag>true</primary-flag>
</category-assignment>
<category-assignment category-id="U30 BOXERS" product-id="505158774834"/>
<recommendation source-id="58462" source-type="product" target-id="505158886294" type="4"/>
</catalog>
Error in wso2carbon.log file
TID: [-1234] [] [2017-03-02 12:15:27,793] INFO {org.apache.synapse.mediators.builtin.LogMediator} - Tryzens_ProductProxy__tracing = before smooks {org.apache.synapse.mediators.builtin.LogMediator}
TID: [-1234] [] [2017-03-02 12:15:28,376] ERROR {freemarker.runtime} - {freemarker.runtime}
Error on line 3, column 12 in repository/resources/smooks/product.ftl
Expecting a string, date or number here, Expression product.ean is instead a freemarker.ext.dom.NodeListModel
The problematic instruction:
----------
==> ${product.ean} [on line 3, column 10 in repository/resources/smooks/product.ftl]
----------
Java backtrace for programmers:
----------
freemarker.core.NonStringException: Error on line 3, column 12 in repository/resources/smooks/product.ftl
Expecting a string, date or number here, Expression product.ean is instead a freemarker.ext.dom.NodeListModel
at freemarker.core.Expression.getStringValue(Expression.java:126)
at freemarker.core.Expression.getStringValue(Expression.java:93)
at freemarker.core.DollarVariable.accept(DollarVariable.java:76)
at freemarker.core.Environment.visit(Environment.java:209)
at freemarker.core.MixedContent.accept(MixedContent.java:92)
at freemarker.core.Environment.visit(Environment.java:209)
at freemarker.core.Environment.process(Environment.java:189)
at freemarker.template.Template.process(Template.java:237)
at org.milyn.templating.freemarker.FreeMarkerTemplateProcessor.applyTemplate(FreeMarkerTemplateProcessor.java:358)
at org.milyn.templating.freemarker.FreeMarkerTemplateProcessor.applyTemplate(FreeMarkerTemplateProcessor.java:346)
at org.milyn.templating.freemarker.FreeMarkerTemplateProcessor.visitAfter(FreeMarkerTemplateProcessor.java:333)
at org.milyn.delivery.sax.SAXHandler.visitAfter(SAXHandler.java:389)
at org.milyn.delivery.sax.SAXHandler.endElement(SAXHandler.java:204)
at org.milyn.delivery.SmooksContentHandler.endElement(SmooksContentHandler.java:96)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.milyn.delivery.sax.SAXParser.parse(SAXParser.java:76)
at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:86)
at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:64)
at org.milyn.Smooks._filter(Smooks.java:526)
at org.milyn.Smooks.filterSource(Smooks.java:482)
at org.wso2.carbon.mediator.transform.SmooksMediator.mediate(SmooksMediator.java:146)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:97)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:59)
at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:158)
at org.apache.synapse.core.axis2.ProxyServiceMessageReceiver.receive(ProxyServiceMessageReceiver.java:210)
at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
at org.apache.axis2.transport.base.AbstractTransportListener.handleIncomingMessage(AbstractTransportListener.java:328)
at org.apache.synapse.transport.vfs.VFSTransportListener.processFile(VFSTransportListener.java:824)
at org.apache.synapse.transport.vfs.VFSTransportListener.scanFileOrDirectory(VFSTransportListener.java:472)
at org.apache.synapse.transport.vfs.VFSTransportListener.poll(VFSTransportListener.java:188)
at org.apache.synapse.transport.vfs.VFSTransportListener.poll(VFSTransportListener.java:134)
at org.apache.axis2.transport.base.AbstractPollingTransportListener$1$1.run(AbstractPollingTransportListener.java:67)
at org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(NativeWorkerPool.java:172)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Please help, i would appreciate any comments to solve this issue Thanks in advance
In the smooks template (.ftl file), if you want to use something like ${product.ean}
, you must define "product" variable :
<#assign product = .vars["product"]>
In your xml input file, all nodes belongs to the same defaut namespace "http://www.demandware.com/xml/impex/catalog/2006-10-31"
You can define this default namespace in FTL with the reserved prefixe "D" : <#ftl ns_prefixes={"D":"http://www.demandware.com/xml/impex/catalog/2006-10-31"}>