There are similar questions on stackoverflow but the answers sort or group the matching items putting them out of sequence compared to the order of the input.
I have the following data
<doc>
<paragraph indent="0" stylename="heading_l1_toc" SDgroup="heading">heading-l1</paragraph>
<paragraph indent="0" stylename="heading_l2_toc" SDgroup="heading">heading-l2</paragraph>
<paragraph indent="0" stylename="list" SDgroup="list">AAA. Para 1.</paragraph>
<paragraph indent="1" stylename="list" SDgroup="list">BBB. Para 2</paragraph>
<paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para1</paragraph>
<paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para2</paragraph>
<paragraph indent="2" stylename="list" SDgroup="list">CCC. Para 3a</paragraph>
<paragraph indent="2" stylename="list" SDgroup="list">DDD. Para 3b/4</paragraph>
<paragraph indent="1" stylename="list" SDgroup="list">EEE. Para 5</paragraph>
<paragraph indent="2" stylename="continued" SDgroup="list">EEE. Continued para</paragraph>
<paragraph indent="0" stylename="list" SDgroup="list">FFF. Para 6</paragraph>
</doc>
I need to nest the paragraph[@stylename='list'} and following paragraph[@stylename='continued'] based on the list/@indent = x and the continued/@indent = x + 1.
The output should look like:
<doc>
<paragraph indent="0" stylename="heading_l1_toc" SDgroup="heading">heading-l1</paragraph>
<paragraph indent="0" stylename="heading_l2_toc" SDgroup="heading">heading-l2</paragraph>
<List indent="0" maxItems="2">
<paragraph stylename="list_unordered" SDgroup="list">AAA. Para 1.</paragraph>
<List indent="1" maxItems="1">
<paragraph indent="1" stylename="list_unordered" SDgroup="list">BBB. Para 2</paragraph>
<paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para1</paragraph>
<paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para2</paragraph>
<List indent="2" maxItems="2">
<paragraph indent="2" stylename="list_unordered" SDgroup="list">CCC. Para 3a</paragraph>
<paragraph indent="2" stylename="list_unordered" SDgroup="list">DDD. Para 34</paragraph>
</List>
<paragraph indent="1" stylename="list_unordered" SDgroup="list">EEE. Para 5</paragraph>
<paragraph indent="2" stylename="continued" SDgroup="list">EEE. Continued para</paragraph>
</List>
<paragraph indent="0" stylename="list_unordered" SDgroup="list">FFF. Para 6</paragraph>
</List>
</doc>
I have tried many, many versions of for-each-group + group-by/adjacent and regular for-each, but end up with out of sequence output and/or correctly nested output but duplicate data (i.e. an item is correctly nested and then, at some point, the same item is output again. I believe this occurs with multiply nested for-each loops.
I'm 99.9% sure this is do'able but can't find a solution that produces what I need. Once again, I'm asking for help, which will of course, be greatly, greatly appreciated.
FYI: I have no control over the input data or means to change it - unless its within XSLT
Following your comment, it's a little easier to understand your output; however, I suggest that your provided output doesn't follow your description because the maxItems
for the second List
element (just after AAA
should be 2
and not 1
since there are two list_unordered
elements that are immediate children of that List
element.
I've reworked my example implementation:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:l="local:functions"
exclude-result-prefixes="#all">
<xsl:output indent="yes" omit-xml-declaration="yes" />
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="/doc" >
<xsl:copy>
<xsl:sequence select="l:processParagraphs('-1', *)" />
</xsl:copy>
</xsl:template>
<xsl:function name="l:processParagraphs" as="element()*">
<xsl:param name="indentLevel" as="xs:string" />
<xsl:param name="paras" as="element(paragraph)*" />
<xsl:choose>
<xsl:when test="head($paras)/@stylename eq 'list'
and
xs:integer(head($paras)/@indent) gt xs:integer($indentLevel)">
<xsl:variable name="itemsUnderList" as="element(paragraph)*"
select="l:nextInList(head($paras)/@indent, $paras)" />
<xsl:variable name="processedInner" as="element()*"
select="l:processInner(head($paras)/@indent, $itemsUnderList)" />
<List indent="{head($paras)/@indent}" maxItems="{count($processedInner[@stylename eq 'list_unordered'])}">
<xsl:sequence select="$processedInner" />
</List>
<xsl:sequence select="l:processParagraphs($indentLevel, $paras except $itemsUnderList)" />
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="l:processInner($indentLevel, $paras)" />
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<xsl:function name="l:processInner" as="element()*">
<xsl:param name="indentLevel" as="xs:string" />
<xsl:param name="paras" as="element(paragraph)*" />
<xsl:apply-templates select="head($paras)" /> <!-- to replace list stylename -->
<xsl:if test="tail($paras)" >
<xsl:sequence select="l:processParagraphs($indentLevel, tail($paras))" />
</xsl:if>
</xsl:function>
<xsl:template match="paragraph/@stylename[. eq 'list']">
<xsl:attribute name="stylename" select="'list_unordered'" />
</xsl:template>
<xsl:function name="l:nextInList" as="element(paragraph)*" >
<xsl:param name="indentLevel" as="xs:string" />
<xsl:param name="paras" as="element(paragraph)*" />
<xsl:if test="xs:integer(head($paras)/@indent) ge xs:integer($indentLevel)" >
<xsl:sequence select="(head($paras),
l:nextInList($indentLevel, tail($paras)))" />
</xsl:if>
</xsl:function>
</xsl:stylesheet>
which produces:
<doc>
<paragraph indent="0" stylename="heading_l1_toc" SDgroup="heading">heading-l1</paragraph>
<paragraph indent="0" stylename="heading_l2_toc" SDgroup="heading">heading-l2</paragraph>
<List indent="0" maxItems="2">
<paragraph indent="0" stylename="list_unordered" SDgroup="list">AAA. Para 1.</paragraph>
<List indent="1" maxItems="2">
<paragraph indent="1" stylename="list_unordered" SDgroup="list">BBB. Para 2</paragraph>
<paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para1</paragraph>
<paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para2</paragraph>
<List indent="2" maxItems="2">
<paragraph indent="2" stylename="list_unordered" SDgroup="list">CCC. Para 3a</paragraph>
<paragraph indent="2" stylename="list_unordered" SDgroup="list">DDD. Para 3b/4</paragraph>
</List>
<paragraph indent="1" stylename="list_unordered" SDgroup="list">EEE. Para 5</paragraph>
<paragraph indent="2" stylename="continued" SDgroup="list">EEE. Continued para</paragraph>
</List>
<paragraph indent="0" stylename="list_unordered" SDgroup="list">FFF. Para 6</paragraph>
</List>
</doc>