xmlbuild-automation

Concise way to "minify" XML using command line tools?


I have a bunch of XML files I'm using for user interface and string translation in my project, each of which have the following structure:

<?xml version="1.0" encoding="UTF-8" ?>
<messages>
    <message id="x">
        <!-- Text node or arbitrary XHTML markup in here -->
    </message>
    <message id="y">
        <!-- Text node or arbitrary XHTML markup in here -->
    </message>
    <message id="z">
        <!-- Text node or arbitrary XHTML markup in here -->
    </message>
    ...
</messages>

As part of my build process I'd like to "minify" these files into a single XML file, whereby each <message> tag and all of its children are embedded within a <messages> tag.

The current solution I have is using grep to rip out the XML prolog, opening messages tag and closing messages tag from each file, and concatenating the result to a new file, after concatenating the XML prolog and opening messages, then finally concatenating the closing messages tag. This solution is... rather messy and error prone.

So, how can I use any command-line XML tools to automate this process? Could I use something like xmlpatterns and/or XSL transforms?

Side question: how would I verify that each <message> tag has an ID attribute, and that all ID attribute values in the final document are unique? I know I can do the first part by means of a DTD, but is the second also in the realm of DTDs or would I need to do something else?


Solution

  • After some research and experimentation, I came up with the following solution:

    First I created an XML with a list of all of the XML files I wanted to combine together:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="merge-messages.xsl"?>
    <bundles>
        <bundle>file1.xml</bundle>
        <bundle>file2.xml</bundle>
        <bundle>file3.xml</bundle>
        ...
    </bundles>
    

    Then I wrote an XSL transform that selected the <message> tags from each file listed in the index file:

    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output omit-xml-declaration="no" indent="yes"/>
    
        <xsl:template match="/bundles">
            <messages>
                <xsl:apply-templates select="document(bundle)/messages/message"/>
            </messages>
        </xsl:template>
    
        <xsl:template match="message">
            <xsl:copy-of select="."/>
        </xsl:template>
    </xsl:stylesheet>
    

    I was using Qt in my project, and Qt happens to include a tool called xmlpatterns which can perform XSL transformations. So I was able to include the following command in my build process and have my XML files automatically "minified" on each build.

    xmlpatterns merge-messages.xsl messages-index.xml -output messages.xml