pythonxpathsaxonxpath-2.0saxon-c

How do I get the line numbers of a saxonc XPath match?


I'm building a report that will show the line numbers of XML elements that match a set of XPaths. I need to support XPath 2.0. Sending the XML to a separate web based processor written in Java or C# is a valid solution, but one I'm avoiding because my entire team works in Python, I want my tool to still work offline, and maintaining another web service is a lot of work.

Saxonche supports XPath 2.0. The documentation describes multiple options for enabling line numbers, but never explains how to get the line numbers out once you have enabled them.

Here's my code:

input_file_path = 'test.xml'  # Contents below
input_xpath = './/foo'

with PySaxonProcessor(license=False) as saxon_proc:

    # Attempt #1 to enable line numbers
    saxon_proc.set_configuration_property('l', 'on')  

    doc_builder = saxon_proc.new_document_builder()
    # Attempt #2 to enable line numbers
    doc_builder.set_line_numbering(True)

    xml_tree = doc_builder.parse_xml(xml_file_name=input_file_path)
    xpath_processor = saxon_proc.new_xpath_processor()
    xpath_processor.set_context(xdm_item=xml_tree)

    foo_elements = xpath_processor.evaluate(input_xpath)
    # Do not see any line numbers on foo_elements in the debugger

I inspected the result of evaluate() in the debugger, but I don't see anything that looks like a line number.

debugger inspect

Both PySaxonProcessor and PyDocumentBuilder have a parse_xml() method. In my code I am using PyDocumentBuilder, but I tried both and didn't notice any differences.

test.XML

<root>
    <foo>fah</foo>
</root>

Apparently there are wrong ways to feed your XML to Saxon that can result in no line numbers, but all of the information I found about that is in other languages.

Any ideas about what I am doing wrong?


Solution

  • I am afraid I can't currently tell there is a way for SaxonC HE, for PE/EE you should be able to use the Saxon XPath extension function saxon:line-number e.g.

    from saxoncee import PySaxonProcessor
    
    with PySaxonProcessor(license=True) as saxon_proc:
        print(saxon_proc.version)
    
        doc_builder = saxon_proc.new_document_builder()
        doc_builder.set_line_numbering(True)
    
        xdm_doc = doc_builder.parse_xml(xml_file_name='sample1.xml')
    
        xpath_processor = saxon_proc.new_xpath_processor()
    
        xpath_processor.set_context(xdm_item=xdm_doc)
    
        xpath_processor.declare_namespace('saxon', 'http://saxon.sf.net/')
    
        items = xpath_processor.evaluate('//item')
    
        for item in items:
            xpath_processor.set_context(xdm_item=item)
            print(item, xpath_processor.evaluate_single('saxon:line-number(.)'))
    

    As I said, I am currently not sure whether there is a way for SaxonC HE, will try to investigate, https://www.saxonica.com/saxon-c/doc12/html/saxonc.html#PyXdmNode, however, doesn't seem to expose any properties similar to the Java API's XdmNode's https://www.saxonica.com/html/documentation12/javadoc/net/sf/saxon/s9api/XdmNode.html#getLineNumber().