I'm trying to use a Blob source containing nested (1 level) XML documents and convert them to nested JSON (1 level)... similar to the following:
Input XML Document
<RootElement>
<RootChild>rootchild</RootChild>
<RootRepeatingElement>
<RootRepeatingElementChild>rootrepeatingelementchild1</RootRepeatingElementChild>
</RootRepeatingElement>
<RootRepeatingElement>
<RootRepeatingElementChild>rootrepeatingelementchild2</RootRepeatingElementChild>
</RootRepeatingElement>
</RootElement>
Output JSON Document
{
"RootChild": "rootchild",
"RootRepeatingElement": [
{
"RootRepeatingElementChild": "rootrepeatingelementchild1"
},
{
"RootRepeatingElementChild": "rootrepeatingelementchild2"
}
]
}
I've tried using the Copy Item mapping, but can't seem to figure out how to do the nested elements.
Is this possible, if so is there a walk through link you can provide or explain how to map the repeating XML elements to a repeating JSON array. If not, is there an alternative?
Thank you.
You can follow below procedure to get your desired Json output:
Use Mapping dataflow to get the requirement. Add your XML file as source in data flow, add flatten transformation to the source, Select the RootRepeatingElement
as the Unroll by field, create input columns as below:
RootElement.RootChild
RootElement.RootRepeatingElement
Add aggregate transformation to flatten transformation to convert the repeated elements back into a JSON array.
collect(RootRepeatingElement)
as aggregate column.Add sink to the aggregate transformation, debug the dataflow, you will get Json as shown below:
{"RootChild":"rootchild","RootRepeatingElement":[{"RootRepeatingElementChild":"rootrepeatingelementchild1"},{"RootRepeatingElementChild":"rootrepeatingelementchild2"}]}
Here is the dataflow Json for your reference:
{
"name": "dataflow1",
"properties": {
"type": "MappingDataFlow",
"typeProperties": {
"sources": [
{
"dataset": {
"referenceName": "srcxml",
"type": "DatasetReference"
},
"name": "source1"
}
],
"sinks": [
{
"dataset": {
"referenceName": "Json1",
"type": "DatasetReference"
},
"name": "sink1"
}
],
"transformations": [
{
"name": "flatten1"
},
{
"name": "aggregate1"
}
],
"scriptLines": [
"source(output(",
" RootElement as (RootChild as string, RootRepeatingElement as (RootRepeatingElementChild as string)[])",
" ),",
" allowSchemaDrift: true,",
" validateSchema: false,",
" ignoreNoFilesFound: false,",
" validationMode: 'none',",
" namespaces: true) ~> source1",
"source1 foldDown(unroll(RootElement.RootRepeatingElement),",
" mapColumn(",
" RootChild = RootElement.RootChild,",
" RootRepeatingElement = RootElement.RootRepeatingElement",
" ),",
" skipDuplicateMapInputs: false,",
" skipDuplicateMapOutputs: false) ~> flatten1",
"flatten1 aggregate(groupBy(RootChild),",
" RootRepeatingElement = collect(RootRepeatingElement)) ~> aggregate1",
"aggregate1 sink(allowSchemaDrift: true,",
" validateSchema: false,",
" umask: 0022,",
" preCommands: [],",
" postCommands: [],",
" skipDuplicateMapInputs: true,",
" skipDuplicateMapOutputs: true) ~> sink1"
]
}
}
}