xslt-2.0xslt-3.0xslt-grouping

xslt 2/3 nested group with for-each-group maintaining input order


There are similar questions on stackoverflow but the answers sort or group the matching items putting them out of sequence compared to the order of the input.

I have the following data

<doc>
<paragraph indent="0" stylename="heading_l1_toc" SDgroup="heading">heading-l1</paragraph>
<paragraph indent="0" stylename="heading_l2_toc" SDgroup="heading">heading-l2</paragraph>
<paragraph indent="0" stylename="list" SDgroup="list">AAA. Para 1.</paragraph>
<paragraph indent="1" stylename="list" SDgroup="list">BBB. Para 2</paragraph>
<paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para1</paragraph>
<paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para2</paragraph>
<paragraph indent="2" stylename="list" SDgroup="list">CCC. Para 3a</paragraph>
<paragraph indent="2" stylename="list" SDgroup="list">DDD. Para 3b/4</paragraph>
<paragraph indent="1" stylename="list" SDgroup="list">EEE. Para 5</paragraph>
<paragraph indent="2" stylename="continued" SDgroup="list">EEE. Continued para</paragraph>
<paragraph indent="0" stylename="list" SDgroup="list">FFF. Para 6</paragraph>
</doc>

I need to nest the paragraph[@stylename='list'} and following paragraph[@stylename='continued'] based on the list/@indent = x and the continued/@indent = x + 1.

The output should look like:

<doc>
  <paragraph indent="0" stylename="heading_l1_toc" SDgroup="heading">heading-l1</paragraph>
  <paragraph indent="0" stylename="heading_l2_toc" SDgroup="heading">heading-l2</paragraph>

  <List indent="0" maxItems="2">
    <paragraph stylename="list_unordered" SDgroup="list">AAA. Para 1.</paragraph>
      <List indent="1" maxItems="1">
        <paragraph indent="1" stylename="list_unordered" SDgroup="list">BBB. Para 2</paragraph>
        <paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para1</paragraph>
        <paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para2</paragraph>
        <List indent="2" maxItems="2">
          <paragraph indent="2" stylename="list_unordered" SDgroup="list">CCC. Para 3a</paragraph>
          <paragraph indent="2" stylename="list_unordered" SDgroup="list">DDD. Para 34</paragraph>
        </List>
        <paragraph indent="1" stylename="list_unordered" SDgroup="list">EEE. Para 5</paragraph>
        <paragraph indent="2" stylename="continued" SDgroup="list">EEE. Continued para</paragraph>
      </List>
      <paragraph indent="0" stylename="list_unordered" SDgroup="list">FFF. Para 6</paragraph>
  </List>
</doc>

I have tried many, many versions of for-each-group + group-by/adjacent and regular for-each, but end up with out of sequence output and/or correctly nested output but duplicate data (i.e. an item is correctly nested and then, at some point, the same item is output again. I believe this occurs with multiply nested for-each loops.

I'm 99.9% sure this is do'able but can't find a solution that produces what I need. Once again, I'm asking for help, which will of course, be greatly, greatly appreciated.

FYI: I have no control over the input data or means to change it - unless its within XSLT


Solution

  • Following your comment, it's a little easier to understand your output; however, I suggest that your provided output doesn't follow your description because the maxItems for the second List element (just after AAA should be 2 and not 1 since there are two list_unordered elements that are immediate children of that List element.

    I've reworked my example implementation:

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                    version="3.0"
                    xmlns:xs="http://www.w3.org/2001/XMLSchema"
                    xmlns:l="local:functions"
                    exclude-result-prefixes="#all">
    
        <xsl:output indent="yes" omit-xml-declaration="yes" />
    
        <xsl:mode on-no-match="shallow-copy"/>
    
        <xsl:template match="/doc" >
            <xsl:copy>
                <xsl:sequence select="l:processParagraphs('-1', *)" />
            </xsl:copy>
        </xsl:template>
    
        <xsl:function name="l:processParagraphs" as="element()*">
            <xsl:param name="indentLevel" as="xs:string"           />
            <xsl:param name="paras"       as="element(paragraph)*" />
    
            <xsl:choose>
                <xsl:when test="head($paras)/@stylename eq 'list'
                                and
                                xs:integer(head($paras)/@indent) gt xs:integer($indentLevel)">
                    <xsl:variable name="itemsUnderList" as="element(paragraph)*"
                                  select="l:nextInList(head($paras)/@indent, $paras)" />
                    <xsl:variable name="processedInner" as="element()*"
                                  select="l:processInner(head($paras)/@indent, $itemsUnderList)" />
                    <List indent="{head($paras)/@indent}" maxItems="{count($processedInner[@stylename eq 'list_unordered'])}">
                        <xsl:sequence select="$processedInner" />
                    </List>
                    <xsl:sequence select="l:processParagraphs($indentLevel, $paras except $itemsUnderList)" />
                </xsl:when>
                <xsl:otherwise>
                    <xsl:sequence select="l:processInner($indentLevel, $paras)" />
                </xsl:otherwise>
            </xsl:choose>
        </xsl:function>
    
        <xsl:function name="l:processInner" as="element()*">
            <xsl:param name="indentLevel" as="xs:string"           />
            <xsl:param name="paras"       as="element(paragraph)*" />
    
            <xsl:apply-templates select="head($paras)" /> <!-- to replace list stylename -->
            <xsl:if test="tail($paras)" >
                <xsl:sequence select="l:processParagraphs($indentLevel, tail($paras))" />
            </xsl:if>
        </xsl:function>
    
        <xsl:template match="paragraph/@stylename[. eq 'list']">
            <xsl:attribute name="stylename" select="'list_unordered'" />
        </xsl:template>
    
        <xsl:function name="l:nextInList" as="element(paragraph)*" >
            <xsl:param name="indentLevel" as="xs:string"           />
            <xsl:param name="paras"       as="element(paragraph)*" />
    
            <xsl:if test="xs:integer(head($paras)/@indent) ge xs:integer($indentLevel)" >
                <xsl:sequence select="(head($paras),
                                   l:nextInList($indentLevel, tail($paras)))" />
            </xsl:if>
        </xsl:function>
    
    </xsl:stylesheet>
    

    which produces:

    <doc>
       <paragraph indent="0" stylename="heading_l1_toc" SDgroup="heading">heading-l1</paragraph>
       <paragraph indent="0" stylename="heading_l2_toc" SDgroup="heading">heading-l2</paragraph>
       <List indent="0" maxItems="2">
          <paragraph indent="0" stylename="list_unordered" SDgroup="list">AAA. Para 1.</paragraph>
          <List indent="1" maxItems="2">
             <paragraph indent="1" stylename="list_unordered" SDgroup="list">BBB. Para 2</paragraph>
             <paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para1</paragraph>
             <paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para2</paragraph>
             <List indent="2" maxItems="2">
                <paragraph indent="2" stylename="list_unordered" SDgroup="list">CCC. Para 3a</paragraph>
                <paragraph indent="2" stylename="list_unordered" SDgroup="list">DDD. Para 3b/4</paragraph>
             </List>
             <paragraph indent="1" stylename="list_unordered" SDgroup="list">EEE. Para 5</paragraph>
             <paragraph indent="2" stylename="continued" SDgroup="list">EEE. Continued para</paragraph>
          </List>
          <paragraph indent="0" stylename="list_unordered" SDgroup="list">FFF. Para 6</paragraph>
       </List>
    </doc>