I am using ElementTree to modify an xliff file with text contained in an Excel sheet. I want to run down the entire file and identify elements where I have a match in my Excel sheet (match is based on segment id which is contained in the "mid" attribute value). Once I find a match, I want to populate the element with text pulled from the Excel sheet. For this example I am using dummy text "Target Segment{segment id}"
My code does everything I want. I can identify each element and pull the element text and attributes as needed. I set the text value of the element and can see the difference before and after when I print the results - "mrk.text" before is None, and after setting the new value , "mrk.text" is populated with the correct dummy text. So everything looks like it is working correctly.
BUT - when I generate the xml file, I can see the element text is still empty. Meanwhile the other modifications I made to the xml - for example registering namespaces and including the xml declaration are working fine).
I am expecting text to appear in "mrk" elements that are children of "target" elements. But nothing gets written there.
I am not sure what I am doing wrong.
I have read through xml.etree.ElementTree documentation on python.org and have searched for the correct answer on this site and several others. I found answers which hint at being the possible solution, but nothing quite does it.
(I know that my tag references can be made without explicitly calling the namespace URIs, but I am new to Element Tree and wanted to solve my problem first before improving my code)
Sample XML that I am trying to modify is here:
<trans-unit id="f60d234c-2d06-47e7-b4aa-e2c7a7caf0e8">
<source>Please select all that apply.</source>
<seg-source>
<mrk mtype="seg"
mid="1751">Please select all that apply.</mrk>
</seg-source>
<target>
<mrk mtype="seg"
mid="1751"/>
</target>
</trans-unit>
Relevant python code here:
beolroot = ET.parse(filetobeol).getroot()
for tu in beolroot.findall(".//{urn:oasis:names:tc:xliff:document:1.2}trans-unit"):
ET.register_namespace("sdl", "http://sdl.com/FileTypes/SdlXliff/1.0")
ET.register_namespace("", "urn:oasis:names:tc:xliff:document:1.2")
#print (tu)
srctxt = tu.find("./{urn:oasis:names:tc:xliff:document:1.2}source")
trg = tu.find("./{urn:oasis:names:tc:xliff:document:1.2}target")
#print (srctxt)
print (srctxt.text)
#print (trg)
for target in tu.findall("./{urn:oasis:names:tc:xliff:document:1.2}target"):
for mrk in target.findall("./{urn:oasis:names:tc:xliff:document:1.2}mrk"):
print ("mrk is element id " + str(mrk))
print ("mrk text is: " +str(mrk.text))
mid = mrk.get("mid")
print ("segment id is: " +str(mid))
if mid in srctrgmap.keys():
mrk = target.find("./{urn:oasis:names:tc:xliff:document:1.2}mrk")
targetvalue = srctrgmap[mid]
#print(targetvalue)
mrk.text = str(targetvalue)
target.text = str(targetvalue)
print ("mrk is STILL element id: " + str(mrk))
print ("new mrk text is: " +str(mrk.text))
print ("new target text is: " +str(target.text))
else:
print("Segment Number " + str(mid) + " has no translation target text")
tree.write("output.sdlxliff", encoding="utf-8", xml_declaration=True)
The following code works. As suggested, I worked on a minimal reproducible example. In doing so, I produced a version that worked. I am not certain what was wrong. But this code now does what I need it to do.
Example xml here:
<?xml version="1.0" encoding="utf-8"?>
<xliff xmlns:sdl="http://sdl.com/FileTypes/SdlXliff/1.0"
xmlns="urn:oasis:names:tc:xliff:document:1.2"
version="1.2"
sdl:version="1.0">
<file original="C:\File\Location\example.xml">
<body>
<trans-unit id="a">
<source>Foo</source>
<seg-source>
<mrk mtype="seg"
mid="1328">Foo</mrk>
</seg-source>
<target>
<mrk mtype="seg"
mid="1328"/>
</target>
<sdl:seg-defs>Bar</sdl:seg-defs>
</trans-unit>
<trans-unit id="b">
<source>My Hovercraft</source>
<seg-source>
<mrk mtype="seg"
mid="1329">My Hovercraft</mrk>
</seg-source>
<target>
<mrk mtype="seg"
mid="1329"/>
</target>
<sdl:seg-defs>Is full of eels</sdl:seg-defs>
</trans-unit>
<trans-unit id="c">
<source>I will not buy this record</source>
<seg-source>
<mrk mtype="seg"
mid="1330">I will not buy this record</mrk>
</seg-source>
<target>
<mrk mtype="seg"
mid="1330"/>
</target>
<sdl:seg-defs>It is scratched</sdl:seg-defs>
</trans-unit>
<trans-unit id="d">
<source>I will not buy this tobacconist</source>
<seg-source>
<mrk mtype="seg"
mid="1331">I will not buy this tobacconist</mrk>
</seg-source>
<target>
<mrk mtype="seg"
mid="1331"/>
</target>
<sdl:seg-defs>It is scratched</sdl:seg-defs>
</trans-unit>
<trans-unit id="f">
<source>I want to buy</source>
<seg-source>
<mrk mtype="seg"
mid="1332">I want to buy</mrk>
</seg-source>
<target>
<mrk mtype="seg"
mid="1332"/>
</target>
<sdl:seg-defs>Some Matches</sdl:seg-defs>
</trans-unit>
</body>
</file>
</xliff>
Working python here:
import xml.etree.ElementTree as ET
filetobeol = 'D:\\Stack\\example.xml'
srctrgmap = {'1328': 'Target Segment1328',
'1330': 'Target Segment1330',
'1332': 'Target Segment1332'
}
tree = ET.parse(filetobeol)
beolroot = tree.getroot()
for tu in beolroot.findall(".//{urn:oasis:names:tc:xliff:document:1.2}trans-unit"):
ET.register_namespace("sdl", "http://sdl.com/FileTypes/SdlXliff/1.0")
ET.register_namespace("", "urn:oasis:names:tc:xliff:document:1.2")
for target in tu.findall("./{urn:oasis:names:tc:xliff:document:1.2}target"):
for mrk in target.findall("./{urn:oasis:names:tc:xliff:document:1.2}mrk"):
print ("mrk is element id " + str(mrk))
print ("mrk text is: " +str(mrk.text))
mid = mrk.get("mid")
print ("segment id is: " +str(mid))
if mid in srctrgmap.keys():
stillmrk = target.find("./{urn:oasis:names:tc:xliff:document:1.2}mrk")
targetvalue = srctrgmap[mid]
#print(targetvalue)
stillmrk.text = str(targetvalue)
print ("mrk is STILL element id: " + str(stillmrk))
print ("new mrk text is: " +str(stillmrk.text))
else:
print("Segment Number " + str(mid) + " has no translation target text")
tree.write("D:\\Stack\\outexample.xml", encoding="utf-8", xml_declaration=True)