xmlxqueryencodeucs2

Encode local name like XmlConvert.EncodeLocalName in pure XQuery


Hello I'd like handle Xml-Files which have encoded node names like for example:

<CST_x002F_SOMETHING>
....
</CST_x002F_SOMETHING>

This node name should be decoded to CST/SOMETHING.

These node names were encoded for example via EncodeName. Is there any built-in XQuery-function to decode these names? Or do you have an encoding / decoding function?

XML Files produced by Oracle-DB use the same escaping mechanism.


Solution

  • Use fn:analyze-string() to split the string and match the _XXXX_ parts. When you encounter one of these parts, use bin:hex() to convert hex to binary, then bin:unpack-unsigned-integer() to convert the binary to an integer, then fn:codepoints-to-string() to convert the integer codepoint to a string.

    The binary functions are documented at https://www.saxonica.com/documentation/index.html#!functions/expath-binary

    Requires Saxon-PE or higher.

    You could also use the new saxon:replace-with() function:

    declare namespace bin = 'http://expath.org/ns/binary'; 
    saxon:replace-with('CST_x002F_SOMETHING', '_x[0-9A-F]{4}_', 
       function($s) {$s => substring(3, 4) 
                        => bin:hex() 
                        => bin:unpack-unsigned-integer(0,2) 
                        => codepoints-to-string()} 
    

    outputs CST/SOMETHING