I am using python to do some conditional changes to an XML document. The incoming document has <?xml version="1.0" ?>
at the top.
I'm using xml.etree.ElementTree
.
How I'm parsing the changed XMl:
filter_update_body = ET.tostring(root, encoding="utf8", method="xml")
The output has this at the top:
<?xml version='1.0' encoding='utf8'?>
The client wants the "encoding" tag removed but if I remove it then it either doesn't include the line at all or it puts in encoding= 'us-ascii'
Can this be done so the output matches: <?xml version="1.0" ?>
?
(I don't know why it matters honestly but that's what I was told needed to happen)
As pointed out in this answer there is no way to make ElementTree omit the encoding attribute. However, as @James suggested in a comment, it can be stripped from the resulting output like this:
filter_update_body = ET.tostring(root, encoding="utf8", method="xml")
filter_update_body = filter_update_body.replace(b"encoding='utf8'", b"", 1)
The b
prefixes are required because ET.tostring()
will return a bytes
object if encoding != "unicode"
. In turn, we need to call bytes.replace()
.
With encoding = "unicode"
(note that this is the literal string "unicode"), it will return a regular str
. In this case, the b
s can be omitted. We use good old str.replace()
.
It's worth noting that the choice between bytes
and str
also affects how the XML will eventually be written to a file. A bytes
object should be written in binary mode, a str
in text mode.