xsltidml

Grouping across children of sibling elements


I'm processing IDML files using XSLT. In IDML which is exported form InDesign, it runs consequtive paragraphs of the same style together, separated by <Br/> tags, like this (my input XML):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Story>
    <ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/para">
        <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]">
            <Content>All rights reserved. No part of this publication may be reproduced in any material form (including photocopying or storing it in any medium by electronic means and whether or not transiently or incidentally to some other use of this publication) without the written permission of the copyright owner except in accordance with the provisions of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the </Content>
        </CharacterStyleRange>
        <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/italic">
            <Content>Copyright Licensing Agency Ltd, Saffron House, 6-10 Kirby Street, London, EC1N 8TS England</Content>
        </CharacterStyleRange>
        <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]">
            <Content>. Applications for the copyright owner’s written permission to reproduce any part of this publication should be addressed to the publisher.</Content>
            <Br/>
            <Content>Warning: The doing of an unauthorised act in relation to a copyright work may result in both a civil claim for damages and criminal prosecution.</Content>
            <Br/>
            <Content>Crown copyright material is reproduced with the permission of the Controller of HMSO and the Queen’s Printer for Scotland.</Content>
            <Br/>
        </CharacterStyleRange>
    </ParagraphStyleRange>
</Story>

Now, I need to turn this into XML which looks like this

<?xml version="1.0" encoding="UTF-8"?>
<story> 
    <para>All rights reserved. No part of this publication may be reproduced in any material form (including photocopying or storing it in any medium by electronic means and whether or not transiently or incidentally to some other use of this publication) without the written permission of the copyright owner except in accordance with the provisions of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the <italic>Copyright Licensing Agency Ltd, Saffron House, 6-10 Kirby Street, London, EC1N 8TS England</italic>. Applications for the copyright owner’s written permission to reproduce any part of this publication should be addressed to the publisher.</para>
    <para>Warning: The doing of an unauthorised act in relation to a copyright work may result in both a civil claim for damages and criminal prosecution.</para>
    <para>Crown copyright material is reproduced with the permission of the Controller of HMSO and the Queen’s Printer for Scotland.</para>
</story>

My XSL looks like this:

<?xml version="1.0" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="/Story">
        <story>
            <xsl:apply-templates/>
        </story>
    </xsl:template>

    <xsl:template match="ParagraphStyleRange">
        <xsl:apply-templates>
            <xsl:with-param name="para_style_name" select="replace(./@AppliedParagraphStyle, 'ParagraphStyle/', '')"/>
        </xsl:apply-templates>
    </xsl:template>

    <xsl:template match="CharacterStyleRange">
        <xsl:param name="para_style_name"/>
        <xsl:variable name="char_style_name" select="replace(./@AppliedCharacterStyle, 'CharacterStyle/', '')"/>
        <xsl:for-each-group select="*" group-ending-with="Br">
            <xsl:element name="{$para_style_name}">
                <xsl:choose>
                    <xsl:when test="$char_style_name = '$ID/[No character style]'">
                        <xsl:apply-templates select="current-group()"/>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:element name="{$char_style_name}">
                            <xsl:apply-templates select="current-group()"/>
                        </xsl:element>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:element>
        </xsl:for-each-group>
    </xsl:template>

    <xsl:template match="Content">
        <xsl:value-of select="."/>
    </xsl:template>

    <xsl:template match="Br"/>

</xsl:stylesheet>

Which almost works, except the first para is split at the italic part.

<?xml version="1.0" encoding="UTF-8"?>
<story> 
    <para>All rights reserved. No part of this publication may be reproduced in any material form (including photocopying or storing it in any medium by electronic means and whether or not transiently or incidentally to some other use of this publication) without the written permission of the copyright owner except in accordance with the provisions of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the </para>
    <para><italic>Copyright Licensing Agency Ltd, Saffron House, 6-10 Kirby Street, London, EC1N 8TS England</italic></para>
    <para>. Applications for the copyright owner’s written permission to reproduce any part of this publication should be addressed to the publisher.</para>
    <para>Warning: The doing of an unauthorised act in relation to a copyright work may result in both a civil claim for damages and criminal prosecution.</para>
    <para>Crown copyright material is reproduced with the permission of the Controller of HMSO and the Queen’s Printer for Scotland.</para>
</story>

As expected, my grouping method doesn't work when spanning across the multiple CharacterStyleRange elements in the input. But is there a grouping method where this can work? Or am I better off taking a different approach, such as terminating CharacterStyleRange and ParagraphStyleRange at every Br and re-opening them as an intermediate step to ease processing?


Solution

  • So, your almost there. Move the grouping one level up and do it in the template matching ParagraphStyleRange:

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">
    
    <xsl:output indent="yes"/>
    
    <xsl:template match="Story">
        <story>
            <xsl:apply-templates select="ParagraphStyleRange"/>
        </story>
    </xsl:template>
    
    <xsl:template match="ParagraphStyleRange">
        <xsl:variable name="para_style_name" select="replace(@AppliedParagraphStyle, 'ParagraphStyle/', '')"/>
        <xsl:for-each-group select="CharacterStyleRange/*" group-ending-with="Br">
            <xs:element name="{para_style_name}">
                <xsl:apply-templates select="current-group()[self::Content]"/>
            </xs:element>
        </xsl:for-each-group>
    </xsl:template>
    
    <xsl:template match="Content">
        <xsl:variable name="char_style_name" select="replace(../@AppliedCharacterStyle, 'CharacterStyle/', '')"/>
        <xsl:choose>
            <xsl:when test="$char_style_name = '$ID/[No character style]'">
                <xsl:value-of select="."/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:element name="{$char_style_name}">
                    <xsl:value-of select="."/>
                </xsl:element>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
    
    </xsl:stylesheet>