javaxmlstringescapeutils

Java escape XML token strings


The answer for most Character encoding is Apache StringEscapeUtils in the commons.text version. Agree. Can be used to escape the strings between the xml tags. But how do I escape the xml tokens themselves?

Allowed chars are simple: https://www.w3.org/TR/xml11/#sec-common-syn

My use case is that I convert a database table into an XML where each column name is one xml token.

<ROW><COL1>Hello</COL1></ROW>

Works fine but what if the column name is "/BIC/COL1"?

<ROW></BIC/COL1>Hello<//BIC/COL1></ROW>

is obviously not valid. Currently I do not even have a plan on how the encoding might look like. Would need to use a _x26BIC_x26COL1 tag name or something similar.

Anything I overlook?


Solution

  • There is no string escaping mechanism for the XML element tag. Some APIs will even reject the name for the new element when it doesn't match the specification for element names. There are at least two possible solutions to your problem:

    1. You can define your own escape mechanism which you use to encode and decode the element name. As an example you could use _ as the escape sequence. The sequence __ (two underscores) will be a literal _ and the sequence _XX or _uXXXX will be the ascii/unicode character you want to write.

    2. You save the column name in an attribute. This way you can save every value in it and even use the XML API of your choice to save the value with the proper encoding.