xmlparsingprocessing-instruction

Parsing XML Processing Instruction


I'm trying to parse the xmlts20130923/xmlconf/xmltest/valid/sa/017a.xml file from the XML W3C Conformance Test Suite 20130923:

<!DOCTYPE doc [
<!ELEMENT doc (#PCDATA)>
]>
<doc><?pi some data ? > <??></doc>

Processing Instructions Definition

I think the the parsed processing instruction should be data: "some data ? > <?" because the "first" processing instruction isn't closed due to the whitespace. Is this a correct assumption or are there two processing instructions of which the second would have no target and no data?


Solution

  • OP's assumtion is correct. PI content is some data ? > <?

    from lxml import etree
    tree = etree.parse("tmp.xml")
    pi = tree.xpath('//processing-instruction()')
    pi[0].text
    'some data ? > <?'