Say I have an XML code like this one:
<a>
<na:Data xmlns:na="http://some_site.com#" Ref="http://another_site.com"
Key="value">
<b>
<c>some_c_attrib</c>
<d>some_d_attrib</d>
<e>some_e_attrib</e>
<f>some_f_attrib</f>
<g>some_g_attrib</g>
</b>
<h>
<i>some_i_attrib</i>
<j>some_j_attrib</j>
</h>
</na:Data>
<da:Newtag xmlns:da="http://new_site.com">
<k name="http://new_new_site.com"/>
And there are few more lines after this. I have parsed the xml using ET.parse(FILENAME)
and then wrote it in a new file using write_c14n("new.xml")
. I now want to extract portion of this new.xml into another xml file where I just want the portion starting from <na:Data xmlns:na="http://some_site.com#" Ref="http://another_site.com" Key="value">
and ending at </h>
.
However, I do not want to use tostring()
as it doesn't retain the canonicalization of the xml as obtained by using write_c14n()
. I was wondering if copying just that portion from new.xml and writing it into another xml will help but I guess it is adding some extra new lines in between and also not retaining the format of the xml as it is.
I have tried out the following ways:
In this way I tried creating another xml with new root as <na:Data xmlns:na="http://some_site.com#" Ref="http://another_site.com" Key="value">
:
from lxml import etree
from io import StringIO, BytesIO
import xml.etree.ElementTree as et
import xml.etree.ElementTree as xml
from xml.etree import ElementTree as ET
tree = etree.parse('file_location/file_to_read.xml')
root = tree.getroot()
sub_root = etree.Element('{http://some_site.com#}Data')
for node in root.find('.//na:Data', namespaces = {'na':'http://some_site.com#'}).getchildren():
sub_root.append(node.element)
new_tree = etree.ElementTree(sub_root)
I just need the object of the new_tree so I can use it as new_tree. However, if I print the above new_tree using tostring()
[i.e print etree.tostring(root_tree,pretty_print=True)
] this is the output I get:
<ns0:Data xmlns:ns0="http://some_site.com#"><b>
<c>some_c_attrib</c>
<d>some_d_attrib</d>
<e>some_e_attrib</e>
<f>some_f_attrib</f>
<g>some_g_attrib</g>
</b>
<h>
<i>some_i_attrib</i>
<j>some_j_attrib</j>
</h>
</ns0:Data>
As you can see the na:Data
got replaced by ns0:Data
and also its keys and values (Ref="http://another_site.com" Key="value"
) are missing. I need a way which can extract a portion of xml as it is with all the attributes,keys and values.
There is no need to create new elements. Just parse the original XML file, extract the na:Data
child element, and write it to a new file.
from lxml import etree
tree = etree.parse('file_location/file_to_read.xml')
Data = tree.find('.//na:Data', namespaces={'na':'http://some_site.com#'})
etree.ElementTree(Data).write_c14n("new.xml")