I have a bunch of XML files I'm using for user interface and string translation in my project, each of which have the following structure:
<?xml version="1.0" encoding="UTF-8" ?>
<messages>
<message id="x">
<!-- Text node or arbitrary XHTML markup in here -->
</message>
<message id="y">
<!-- Text node or arbitrary XHTML markup in here -->
</message>
<message id="z">
<!-- Text node or arbitrary XHTML markup in here -->
</message>
...
</messages>
As part of my build process I'd like to "minify" these files into a single XML file, whereby each <message>
tag and all of its children are embedded within a <messages>
tag.
The current solution I have is using grep to rip out the XML prolog, opening messages tag and closing messages tag from each file, and concatenating the result to a new file, after concatenating the XML prolog and opening messages, then finally concatenating the closing messages tag. This solution is... rather messy and error prone.
So, how can I use any command-line XML tools to automate this process? Could I use something like xmlpatterns and/or XSL transforms?
Side question: how would I verify that each <message>
tag has an ID attribute, and that all ID attribute values in the final document are unique? I know I can do the first part by means of a DTD, but is the second also in the realm of DTDs or would I need to do something else?
After some research and experimentation, I came up with the following solution:
First I created an XML with a list of all of the XML files I wanted to combine together:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="merge-messages.xsl"?>
<bundles>
<bundle>file1.xml</bundle>
<bundle>file2.xml</bundle>
<bundle>file3.xml</bundle>
...
</bundles>
Then I wrote an XSL transform that selected the <message>
tags from each file listed in the index file:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="no" indent="yes"/>
<xsl:template match="/bundles">
<messages>
<xsl:apply-templates select="document(bundle)/messages/message"/>
</messages>
</xsl:template>
<xsl:template match="message">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
I was using Qt in my project, and Qt happens to include a tool called xmlpatterns which can perform XSL transformations. So I was able to include the following command in my build process and have my XML files automatically "minified" on each build.
xmlpatterns merge-messages.xsl messages-index.xml -output messages.xml