I have an XML document with records that have separate fields for subject tags in English and Spanish. Individual tags are separated by semicolons.
<collections>
<collection name="anyCollection">
<record>
<field name="materia">comida; bebida; fiesta</field>
<field name="subject">food; drink; party</field>
<field name="recordid">abc0001</field>
</record>
<record>
<field name="materia">comida; bebida; fiesta</field>
<field name="subject">food; drink; party</field>
<field name="recordid">abc0002</field>
</record>
<record>
<field name="materia">comida; bebida; fiesta</field>
<field name="subject">food; drink; party</field>
<field name="recordid">abc0003</field>
</record>
<record>
<field name="materia">fiesta; sombreros; música; baile; agua; cerveza; sopa</field>
<field name="subject">party; hats; music; dance; water; beer; soup</field>
<field name="recordid">abc0004</field>
</record>
<record>
<field name="materia">comida; bebida; fiesta; sombreros; música</field>
<field name="subject">food; drink; party; hats; music</field>
<field name="recordid">abc0005</field>
</record>
<record>
<field name="materia">comida; bebida; cerveza; agua</field>
<field name="subject">food; drink; beer; water</field>
<field name="recordid">abc0006</field>
</record>
<record>
<field name="materia">fiesta; sombreros; música; baile; agua; cerveza</field>
<field name="subject">party; hats; music; dance; water; beer</field>
<field name="recordid">abc0007</field>
</record>
</collection>
</collections>
I want to be able to output a text file with the contents of the two fields grouped and aligned by position so that I can be sure that they are mirror images of each other. Here is my current stylesheet. It produces the basic output that I want, but it does not do it dynamically. Basically, I want to be able to iterate through the contents of each field by position. I'm guessing I need some kind of recursive template or function, but I'm having trouble figuring it out.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs"
version="2.0">
<xsl:variable name="field">
<xsl:for-each
select="collections/collection[@name='anyCollection']/record">
<record>
<xsl:for-each select="field">
<field>
<xsl:for-each select="tokenize(.[@name='materia'],';')">
<materia>
<xsl:value-of select="."/>
</materia>
</xsl:for-each>
<xsl:for-each select="tokenize(.[@name='subject'],';')">
<subject>
<xsl:value-of select="."/>
</subject>
</xsl:for-each>
</field>
</xsl:for-each>
</record>
</xsl:for-each>
</xsl:variable>
<xsl:variable name="align">
<xsl:for-each select="$field/record/field">
<languagePair1>
<xsl:for-each select="materia[1]">
<xsl:value-of select="."/>
<xsl:text>_</xsl:text>
</xsl:for-each>
<xsl:for-each select="subject[1]">
<xsl:value-of select="."/>
<xsl:text> </xsl:text>
</xsl:for-each>
</languagePair1>
<languagePair2>
<xsl:for-each select="materia[2]">
<xsl:value-of select="."/>
<xsl:text>_</xsl:text>
</xsl:for-each>
<xsl:for-each select="subject[2]">
<xsl:value-of select="."/>
<xsl:text> </xsl:text>
</xsl:for-each>
</languagePair2>
</xsl:for-each>
</xsl:variable>
<xsl:template match="/">
<xsl:for-each-group select="$align/languagePair1" group-by=".">
<xsl:value-of select="current-grouping-key()"/>
</xsl:for-each-group>
<xsl:for-each-group select="$align/languagePair2" group-by=".">
<xsl:value-of select="current-grouping-key()"/>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
Here is the basic output I want:
comida_food
bebida_drink
fiesta_party
sombreros_hats
música_music
I also need to output the recordid
s associated with each tag, but I haven't been able to include this in the stylesheet yet.
With that information added, the desired output would look like this:
comida_food
abc0001
abc0002
abc0003
abc0005
abc0006
bebida_drink
abc0001
abc0002
abc0003
abc0005
abc0006
fiesta_party
abc0001
abc0002
abc0003
abc0004
abc0005
abc0007
sombreros_hats
abc0004
abc0005
abc0007
música_music
abc0004
abc0005
abc0007
A nice use case for fn:for-each-pair in XPath 3.0:
for-each-pair(
tokenize($materia, '; '),
tokenize($subject, '; '),
function($x, $y) { $x || '_' || $y || '
' })
Available in Saxon-PE 9.5.1.1.