I'm trying to parse the xmlts20130923/xmlconf/xmltest/valid/sa/017a.xml
file from the XML W3C Conformance Test Suite 20130923:
<!DOCTYPE doc [
<!ELEMENT doc (#PCDATA)>
]>
<doc><?pi some data ? > <??></doc>
Processing Instructions Definition
I think the the parsed processing instruction should be data: "some data ? > <?"
because the "first" processing instruction isn't closed due to the whitespace. Is this a correct assumption or are there two processing instructions of which the second would have no target and no data?
OP's assumtion is correct. PI content is some data ? > <?
from lxml import etree
tree = etree.parse("tmp.xml")
pi = tree.xpath('//processing-instruction()')
pi[0].text
'some data ? > <?'