javascalastreamstreamwriterstax

Using Stax2 to escape special characters in Scala


I am trying to use Stax2 in order to write xml files with escaping special characters for the attributes.

When I am trying to achieve is an exact output like this:

<elem1 att1="This &#x0A; That" />

But when I use the usual XMLStreamWriter the output is this:

<elem1 att1="This &amp;#x0A; That" />

So I tried the following with Stax2:

import org.codehaus.stax2.{XMLOutputFactory2}
import org.scalatest.funsuite.AnyFunSuite
import java.io.{File, FileOutputStream}
import javax.xml.stream.{XMLOutputFactory, XMLStreamWriter}

class testStreamXML extends AnyFunSuite{
  val file = new File("stax2test.xml")
  val fileOutputStream = new FileOutputStream(file)
  val outputFactory: XMLOutputFactory2 = XMLOutputFactory.newInstance().asInstanceOf[XMLOutputFactory2]
  //outputFactory.setProperty(XMLOutputFactory2.P_ATTR_VALUE_ESCAPER, true)

  val writer= outputFactory.createXMLStreamWriter(fileOutputStream)

  writer.writeStartDocument()
  writer.writeStartElement("elem1")
  writer.writeAttribute("att1", "This &#x0A; That")
  writer.writeEndElement()
  writer.writeEndDocument()
}

And whenever i try to set the property P_ATTR_VALUE_ESCAPER to true or false, I receive this error:

An exception or error caused a run to abort: class java.lang.Boolean cannot be cast to class org.codehaus.stax2.io.EscapingWriterFactory (java.lang.Boolean is in module java.base of loader 'bootstrap'; org.codehaus.stax2.io.EscapingWriterFactory is in unnamed module of loader 'app') 
java.lang.ClassCastException: class java.lang.Boolean cannot be cast to class org.codehaus.stax2.io.EscapingWriterFactory (java.lang.Boolean is in module java.base of loader 'bootstrap'; org.codehaus.stax2.io.EscapingWriterFactory is in unnamed module of loader 'app')
    at com.ctc.wstx.api.WriterConfig.setProperty(WriterConfig.java:401)
    at com.ctc.wstx.api.CommonConfig.setProperty(CommonConfig.java:100)
    at com.ctc.wstx.stax.WstxOutputFactory.setProperty(WstxOutputFactory.java:153)
    at testStreamXML3.<init>(testStreamXML3.scala:10)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:64)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
    at java.base/java.lang.reflect.ReflectAccess.newInstance(ReflectAccess.java:128)
    at java.base/jdk.internal.reflect.ReflectionFactory.newInstance(ReflectionFactory.java:350)
    at java.base/java.lang.Class.newInstance(Class.java:645)
    at org.scalatest.tools.Runner$.genSuiteConfig(Runner.scala:1402)
    at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$8(Runner.scala:1199)
    at scala.collection.immutable.List.map(List.scala:246)
    at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1198)
    at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:993)
    at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:971)
    at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1480)
    at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:971)
    at org.scalatest.tools.Runner$.run(Runner.scala:798)
    at org.scalatest.tools.Runner.run(Runner.scala)
    at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2or3(ScalaTestRunner.java:38)
    at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:25)

Any suggestion how to use resolve this? Or to achieve my goal of escaping special characters in attribute?


Solution

  • The property you are referring to does require a class of EscapingWriterFactory. Here are the docs:

    Property that can be set if a custom output escaping for attribute value content is needed. The value set needs to be of type EscapingWriterFactory. When set, the factory will be used to create a per-writer instance used to escape all attribute values written, both via explicit XMLStreamWriter.writeAttribute(java.lang.String, java.lang.String) methods, and via copy methods (XMLStreamWriter2.copyEventFromReader(org.codehaus.stax2.XMLStreamReader2, boolean)).

    Regarding your question of how to achieve an "individual" escaping an implementation of this factory would do the job. Here is a simple implementation (inspired by Escaping quotes using jackson-dataformat-xml) using the given writer without applying any escaping - this might be your starting point for any special use case you want to address:

    class CustomXmlEscapingWriterFactory extends EscapingWriterFactory{
      override def createEscapingWriterFor(writer: Writer, s: String): Writer =
        new Writer(){
          override def write(cbuf: Array[Char], off: Int, len: Int): Unit =
            writer.write(cbuf, off, len)
          override def flush(): Unit = writer.flush()
          override def close(): Unit = writer.close()
        }
    
      override def createEscapingWriterFor(outputStream: OutputStream, s: String): Writer =
        throw IllegalArgumentException("not supported")
    }
    
    class TestStreamXML extends AnyFunSuite{
      val file = new File("stax2test.xml")
      val fileOutputStream = new FileOutputStream(file)
      val oprovider: OutputFactoryProviderImpl  = new OutputFactoryProviderImpl()
      val outputFactory: XMLOutputFactory2 = oprovider.createOutputFactory()
      // your factory implementation goes here as property
      outputFactory.setProperty(XMLOutputFactory2.P_ATTR_VALUE_ESCAPER, CustomXmlEscapingWriterFactory())
    
      val writer= outputFactory.createXMLStreamWriter(fileOutputStream)
    
      writer.writeStartDocument()
      writer.writeStartElement("elem1")
      writer.writeAttribute("att1", "This &#x0A; That")
      writer.writeEndElement()
      writer.writeEndDocument()
    }
    

    The resulting output looks like this:

    <?xml version='1.0' encoding='UTF-8'?><elem1 att1="This &#x0A; That"/>