To I'm trying use nul (U+0) to delimit xml values in xmlstarlet output. xmlstarlet
ignores -o ''
, -o $'\0'
, and -o '\0'
.
I'm aware that I can use other characters like the various field separators to delimit output. The problem with this approach is that these characters can also exist as data. I don't want any ambiguity.
I want to to use nul specifically because it's the only value that can't be represented in raw XML.
So, to repeat my question: How do I separate xmlstarlet output with nul?
I've included the following information at the request of the folks who requested it. While I appreciate your desire to help, please avoid suggesting XY sulutions. I'm only looking for an answer to my question as presented.
The data I'm working with looks like this:
<data>
<datapoint attribute-1="val-1" attribute-2="val-a" />
<datapoint attribute-1="val-2" attribute-2="val-b" />
<datapoint attribute-1="val-3">
<sub-datapoint />
</datapoint>
</data>
The way I'm trying to use xmlstarlet:
mapfile -tf ARRAY < <( xmlstarlet sel -t -m /data/datapoint -o 'datapoint' -o $'\0' -v ./@attribute-1 -o $'\0' data.xml )
A hexdump of the output I'm looking for:
64 61 74 61 70 6f 69 6e 74 00 76 61 6c 2d 31 00 |datapoint.val-1.|
64 61 74 61 70 6f 69 6e 74 00 76 61 6c 2d 32 00 |datapoint.val-2.|
64 61 74 61 70 6f 69 6e 74 00 76 61 6c 2d 33 00 |datapoint.val-3.|
Unfortunately, xmlstarlet
doesn't seem to be capable of producing nul in its output.
xmlstarlet
is however capable of producing U+FFFF
; A codepoint that's invalid in all XML versions. You can use this code to safely delimit XML values, and then use another program to replace it with nul:
xmlstarlet sel -t \
-m /data/datapoint \
-o 'datapoint' \
-o $'\uffff' \
-v ./@attribute-1 \
-o $'\uffff' data.xml \
| python3 -c 'import sys;
sys.stdout.write(sys.stdin.read().replace("\uffff", "\0"))'