talendtalend-mdm

Parsing xml with multiple headers Talend


I'm trying to read xml data received which have multiple header

Example :

<?xml version="1.0" encoding="utf-8"?>                         
 <RepeaterData>
<Version />
<Items>
    <Item>
        <year>2017</year>
        <Additional>
            <?xml version="1.0" encoding="utf-8"?>
            <RepeaterData>
                <Version />
                <Items>
                        <Name>toto</Nom>
                </Items>
            </RepeaterData>
        </AdditionalCharge>
    </Item>
    <Item>
        <year>2018</year>
        <Additional >
            <?xml version="1.0" encoding="utf-8"?>
            <RepeaterData>
                <Version />
                <Items>
                    <Item>
                        <element type="System.String">3</Sousdept>
                        <Name type="System.String">toto</Nom>
                        </Item>
                    <Item>
                        <element type="System.String">3</Sousdept>
                        <Name type="System.String">tata</Nom>
                        </Item>
                </Items>
            </RepeaterData>
        </Additional>
    </Item>
</Items>

I tried also to delete the xml header with StringHandling.EREPLACE ( b ,"<?xml version=1.0 encoding=utf-8?>",""); but it's not working

Help please !!!!


Solution

  • Here's a quick and dirty solution. Strip all the xml headers from your input using a tReplace, and write the result to a file where you have already written an xml header (since your 1st header has been deleted).

    enter image description here

    tFileInputFullRow_1 would be your tRestClient.

    tFileInputDelimited_2 contains the xml header written by tFixedFlowInput_1.

    tFileOutputDelimited_1 writes to the same file as tFileOutputDelimited_2 in Append mode.

    You can then read the resulting xml file.