groovycharacter-encodingxml-parsingxml-declaration

Parse XML using Groovy: Override charset in declaration and add XML processing instruction


My initial question have been answered, but that did just open up for further issues.

Example code

Using Groovy 2.0.5 JVM 1.6.0_31

import groovy.xml.*
import groovy.xml.dom.DOMCategory

def xml = '''<?xml version="1.0" encoding="UTF-16"?>
            | <?xml-stylesheet type="text/xsl" href="Bp8DefaultView.xsl"?>
             |<root>
            |  <Settings>
            |    <Setting name="CASEID_SEQUENCE_SIZE">
            |      <HandlerURL>
            |        <![CDATA[ admin/MainWindow.jsp ]]>
            |      </HandlerURL>
            |    </Setting>
            |    <Setting name="SOMETHING_ELSE">
            |      <HandlerURL>
            |        <![CDATA[ admin/MainWindow.jsp ]]>
            |      </HandlerURL>
            |    </Setting>
            |  </Settings>
            |</root>'''.stripMargin()

def document = DOMBuilder.parse( new StringReader( xml ) )
def root = document.documentElement

// Edit: Added the line below 
def pi = document.createProcessingInstruction('xml-stylesheet', 'type="text/xsl" href="Bp8DefaultView.xsl"');
// Edit #2: Added line below
document.insertBefore(pi, root)

use(DOMCategory) {
  root.Settings.Setting.each {
    if( it.'@name' == 'CASEID_SEQUENCE_SIZE' ) {
      it[ '@value' ] = 100
    }
  }
}

def outputfile = new File( 'c:/temp/output.xml' )
XmlUtil.serialize( root , new PrintWriter(outputfile))
// Edit #2: Changed from root to document.documentElement to see if that 
// would make any difference
println XmlUtil.serialize(document.documentElement)

Problem description

I'm trying to parse a XML-file exported from a third party tool, and before promoting it to stage and production I need to replace certain attribute values. That is all ok, but in addition I must keep the encoding and ref. to the stylesheet.

Since this is pretty static it is totally ok to have both the encoding and the stylesheet ref. definition in a property-file, meaning: I do not need first to find the declarations in the original file.

Encoding in declaration issue

As shown in this answer found here on StackOverFlow you can do

new File('c:/data/myutf8.xml').write(f,'utf-8')

I have also tried

XmlUtil.serialize( root , new GroovyPrintStream('c:/temp/output.txt', 'utf-16'))

but it did not solve the problem for me either. So I have not understood how to override the UTF-value.

Processing instruction issue

Simply, how do I add

<?xml-stylesheet type="text/xsl" href="Bp8DefaultView.xsl"?>

to the output?

Update - This can be done like this

def pi = document.createProcessingInstruction('xml-stylesheet', 'type="text/xsl" href="Bp8DefaultView.xsl"');

The processing instruction is being added like this, this guideline showed me, but still I do not get the output.

document.insertBefore(pi, root) // Fails

Solution

  • All issues in this question has been answered in another question I raised, see Groovy and XML: Not able to insert processing instruction

    The trick is that I expected

    document.documentElement
    

    to contain the processing instruction. But that is wrong, documentElement is:

    ...This is a convenience attribute that allows direct access to the child node that is the document element of the document...

    Where the processing instruction is just another child node. So what I instead had to use was either the LSSerializer or the Transfomer. See Serialize XML processing instruction before root element for details.