groovyxmlslurper

How to print the path for current XML node in Groovy?


I am iterating through an XML file and want to print the gpath for each node with a value. I spent the day reading Groovy API docs and trying things, but it seems that what I think is simple, is not implemented in any obvious way.

Here is some code, showing the different things you can get from a NodeChild.

    import groovy.util.XmlSlurper

    def myXmlString = '''
    <transaction>
        <payment>
            <txID>68246894</txID>
            <customerName>Huey</customerName>
            <accountNo type="Current">15778047</accountNo>
            <txAmount>899</txAmount>
        </payment>
        <receipt>
            <txID>68246895</txID>
            <customerName>Dewey</customerName>
            <accountNo type="Current">16288</accountNo>
            <txAmount>120</txAmount>
        </receipt>
        <payment>
            <txID>68246896</txID>
            <customerName>Louie</customerName>
            <accountNo type="Savings">89257067</accountNo>
            <txAmount>210</txAmount>
        </payment>
        <payment>
            <txID>68246897</txID>
            <customerName>Dewey</customerName>
            <accountNo type="Cheque">123321</accountNo>
            <txAmount>500</txAmount>
        </payment>
    </transaction>
    '''

    def transaction = new XmlSlurper().parseText(myXmlString)

    def nodes = transaction.'*'.depthFirst().findAll { it.name() != '' }

    nodes.each { node -> 
        println node
        println node.getClass()
        println node.text()
        println node.name()
        println node.parent()
        println node.children()
        println node.innerText
        println node.GPath
        println node.getProperties()
        println node.attributes()
        node.iterator().each { println "${it.name()} : ${it}" }
        println node.namespaceURI()
        println node.getProperties().get('body').toString()
        println node.getBody()[0].toString()
        println node.attributes()
    }        

I found a post groovy Print path and value of elements in xml that came close to what I need, but it doesn't scale for deep nodes (see output below).

Example code from link:

    transaction.'**'.inject([]) { acc, val -> 
        def localText = val.localText() 
        acc << val.name()

        if( localText ) {
            println "${acc.join('.')} : ${localText.join(',')}"
            acc = acc.dropRight(1) // or acc = acc[0..-2]
        }
        acc
    }

Output of example code :

    transaction/payment/txID : 68246894
    transaction/payment/customerName : Huey
    transaction/payment/accountNo : 15778047
    transaction/payment/txAmount : 899
    transaction/payment/receipt/txID : 68246895
    transaction/payment/receipt/customerName : Dewey
    transaction/payment/receipt/accountNo : 16288
    transaction/payment/receipt/txAmount : 120
    transaction/payment/receipt/payment/txID : 68246896
    transaction/payment/receipt/payment/customerName : Louie
    transaction/payment/receipt/payment/accountNo : 89257067
    transaction/payment/receipt/payment/txAmount : 210
    transaction/payment/receipt/payment/payment/txID : 68246897
    transaction/payment/receipt/payment/payment/customerName : Dewey
    transaction/payment/receipt/payment/payment/accountNo : 123321
    transaction/payment/receipt/payment/payment/txAmount : 500

Besides help getting it right, I also want to understand why there isn't a simple function like node.path or node.gpath that prints the absolute path to a node.


Solution

  • You could do this sort of thing:

    import groovy.util.XmlSlurper
    import groovy.util.slurpersupport.GPathResult
    
    def transaction = new XmlSlurper().parseText(myXmlString)
    
    def leaves = transaction.depthFirst().findAll { it.children().size() == 0 }
    
    def path(GPathResult node) {
        def result = [node.name()]
        def pathWalker = [hasNext: { -> node.parent() != node }, next: { -> node = node.parent() }] as Iterator
        (result + pathWalker.collect { it.name() }).reverse().join('/')
    }
    
    leaves.each { node -> 
        println "${path(node)} = ${node.text()}"
    }        
    

    Which gives the output:

    transaction/payment/txID = 68246894
    transaction/payment/customerName = Huey
    transaction/payment/accountNo = 15778047
    transaction/payment/txAmount = 899
    transaction/receipt/txID = 68246895
    transaction/receipt/customerName = Dewey
    transaction/receipt/accountNo = 16288
    transaction/receipt/txAmount = 120
    transaction/payment/txID = 68246896
    transaction/payment/customerName = Louie
    transaction/payment/accountNo = 89257067
    transaction/payment/txAmount = 210
    transaction/payment/txID = 68246897
    transaction/payment/customerName = Dewey
    transaction/payment/accountNo = 123321
    transaction/payment/txAmount = 500
    

    Not sure that's what you want though, as you don't say why it "doesn't scale for deep nodes"