I need to remove everything thing starting with "[@" and ending with "]" in the given XML. The text that falls in-between these sqaure brackers should also be removed. I tried the replace function but it turned the XML into a String and did not remove the full string including [@ and ]. Please see below sample input and expected output;
Sample input. See the [@ inside the < differences> tag:
<?xml version='1.0' encoding='UTF-8'?>
<ns0:trans xmlns:ns0="http://Test.Sample.data/1.0" id="1495144" TPId="4aec" change="0" count="1" dateStamp="2024-08-02T03:07:48" effectiveDate="2024-08-01" transId="12447055">
<data id="FF2AD08D621">
<StateFlag>0</StateFlag>
<DCC>B</DCC>
<LastName>MMNGorthy</LastName>
<Name>Scott</Name>
<Structure>NonProfit</Structure>
<BusinessType>Engineering</BusinessType>
<meams>AI</meams>
<Count>0</Count>
<CTier>000</CTier>
<CTCond>111</CTCond>
<CTPl>222</CTPl>
<DO>1</DO>
<AuthObject>0</AuthObject>
</data>
<differences ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce">
<difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@dateModified" fullPath="/session[@id='1034215']/properties[@id='p0228B52620A948E480D38667394112C5']/@dateModified" oldValue="2023-09-29" newValue="2023-11-05" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
<difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@manuscript" fullPath="/session[@id='1034215']/properties[@id='p0228B52620A948E480D38667394112C5']/@manuscript" oldValue="PGR_CommercialAuto_Pages_US_7_8_0_0" newValue="PGR_TransACTMaster2_10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
<difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@caption" fullPath="/session[@id='1034215']/properties[@id='p0228B52620A948E480D38667394112C5']/@caption" oldValue="PGR CommercialAuto Pages US 7.8.0.0" newValue="PGR TransACT Master 2.10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
</differences>
</ns0:trans>
Expected output. Notice how every text that starts with "[@" has been removed under < differences> tag:
<?xml version='1.0' encoding='UTF-8'?>
<ns0:trans xmlns:ns0="http://Test.Sample.data/1.0" id="1495144" TPId="4aec" change="0" count="1" dateStamp="2024-08-02T03:07:48" effectiveDate="2024-08-01" transId="12447055">
<data id="FF2AD08D621">
<StateFlag>0</StateFlag>
<DCC>B</DCC>
<LastName>MMNGorthy</LastName>
<Name>Scott</Name>
<Structure>NonProfit</Structure>
<BusinessType>Engineering</BusinessType>
<meams>AI</meams>
<Count>0</Count>
<CTier>000</CTier>
<CTCond>111</CTCond>
<CTPl>222</CTPl>
<DO>1</DO>
<AuthObject>0</AuthObject>
</data>
<differences ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce">
<difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@dateModified" fullPath="/session/properties/@dateModified" oldValue="2023-09-29" newValue="2023-11-05" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
<difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@manuscript" fullPath="/session/properties/@manuscript" oldValue="PGR_CommercialAuto_Pages_US_7_8_0_0" newValue="PGR_TransACTMaster2_10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
<difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@caption" fullPath="/session/properties/@caption" oldValue="PGR CommercialAuto Pages US 7.8.0.0" newValue="PGR TransACT Master 2.10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
</differences>
</ns0:trans>
Here is the code that I tried:
%dw 2.0
import * from dw::core::Strings
output text/plain
---
(write(payload, "application/json") replace "[@" with "") replace "]" with ""
Output of my code has made everything Null under < differences> tag:
{
"trans": {
"data": {
"StateFlag": "0",
"DCC": "B",
"LastName": "MMNGorthy",
"Name": "Scott",
"Structure": "NonProfit",
"BusinessType": "Engineering",
"meams": "AI",
"Count": "0",
"CTier": "000",
"CTCond": "111",
"CTPl": "222",
"DO": "1",
"AuthObject": "0"
},
"differences": {
"difference": null,
"difference": null,
"difference": null
}
}
}
Replacing over the entire XML as a string causes the script to lose the structure provided by the XML object, which would make the transformation more fragile and prone to errors. What you really should do is to use a recursive function to transverse the input object, using pattern-matching to identify the current type and transform the values inside object attributes only. I recommend to get familiar with the basic structure of such functions because they are commonly needed.
fun recursiveFunction(x)=
x match {
case o is Object -> o mapObject ((value, key) -> (key): recursiveFunction(value)
case a is Array -> a map recursiveFunction($)
case s is String -> recursiveFunction(s)
case n is Number -> recursiveFunction(n)
...
else -> x
}
Note that in XML there is no concept of arrays, like in JSON, however I added a check for them anyway in case the input changes. JSON in the other hand does not has a concept of attributes so it may not be useful for you. You can remove or comment it.
I also added a check for the value of string XML elements. If you only want to transform attributes you can remove it.
To make it more generic we can pass the actual regular expression as a parameter. This is the recursive function
fun removeRegex(x, r: Regex)=
x match {
case o is Object -> o mapObject ((value, key) -> (key) @((removeInAttributes(log(key.@), r))): removeRegex(value, r) )
case s is String -> removeInStrings(s, r) // if needed for element values too
case a is Array -> a map removeRegex($, r) // XML doesn't has arrays but I'm leaving this branch in case the input format changes
else -> x
}
About the actual replacement of the string value, the script in the question only replaces the delimiters. You want to remove also the content between those delimiters. For that we can use a regular expression. We need to be careful to stop at the first ending delimiter or it could consume the text between two separate square brackets sections.
/\[@[^\]]+]/
I moved the actual removal to a separate function for clarity and because I reused it for Strings. In case the input object doesn't has attributes I used a second definition of the function to catch it without having to check for nulls.
fun removeInAttributes(o: Object, r: Regex) = o mapObject ((value, key) -> (key): removeInStrings(value, r) )
fun removeInAttributes(n: Null, r: Regex) = null // if there are no attributes do nothing
fun removeInStrings(s: String, r: Regex) = s replace r with ("")
Finally you just call the main function with the input XML and the regular expression:
---
removeRegex(payload, /\[@[^\]]+]/)
Output:
<?xml version='1.0' encoding='UTF-8'?>
<ns0:trans xmlns:ns0="http://Test.Sample.data/1.0" id="1495144" TPId="4aec" change="0" count="1" dateStamp="2024-08-02T03:07:48" effectiveDate="2024-08-01" transId="12447055">
<data id="FF2AD08D621">
<StateFlag>0</StateFlag>
<DCC>B</DCC>
<LastName>MMNGorthy</LastName>
<Name>Scott</Name>
<Structure>NonProfit</Structure>
<BusinessType>Engineering</BusinessType>
<meams>AI</meams>
<Count>0</Count>
<CTier>000</CTier>
<CTCond>111</CTCond>
<CTPl>222</CTPl>
<DO>1</DO>
<AuthObject>0</AuthObject>
</data>
<differences ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce">
<difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@dateModified" fullPath="/session/properties/@dateModified" oldValue="2023-09-29" newValue="2023-11-05" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce"/>
<difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@manuscript" fullPath="/session/properties/@manuscript" oldValue="PGR_CommercialAuto_Pages_US_7_8_0_0" newValue="PGR_TransACTMaster2_10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce"/>
<difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@caption" fullPath="/session/properties/@caption" oldValue="PGR CommercialAuto Pages US 7.8.0.0" newValue="PGR TransACT Master 2.10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce"/>
</differences>
</ns0:trans>