xslt-2.0xslt-grouping

How do I restructure this block of XML?


Okay, so what's been handed to me is this block of nodes that's the result of a transformation from Word to XML that I have no control over. The next step for this will be a transformation to HTML. And for that it makes more sense to have "Student_Response" blocks inside of which there's both plain and text, rather than going back and forth between Student_Response and Student_Response_Italic the way they have. So here's the way it is now ...

<ul outputclass="Spoken_Bullet">
   <li>This is a block of standard text.</li>
   <li>“What did people bring to the picnic? Say <i>Krista</i> with me: <i>Krista</i>. <i>Krista</i> starts with /k/. Can Krista bring ketchup?” 
      <ph outputclass="Student_Response">(Yes. </ph>
      <ph outputclass="Student_Response_Italic">Krista</ph>
      <ph outputclass="Student_Response"> and </ph>
      <ph outputclass="Student_Response_Italic">ketchup</ph>
      <ph outputclass="Student_Response"> both start with /k/.)</ph> “Can Krista bring relish?” 
      <ph outputclass="Student_Response">(No. </ph>
      <ph outputclass="Student_Response_Italic">Relish</ph>
      <ph outputclass="Student_Response"> does not start with /k/.)</ph> “Can Krista bring kale?” 
      <ph outputclass="Student_Response">(Yes. </ph>
      <ph outputclass="Student_Response_Italic">Krista</ph>
      <ph outputclass="Student_Response"> and </ph>
      <ph outputclass="Student_Response_Italic">kale</ph>
      <ph outputclass="Student_Response"> both start with /k/.)</ph>
   </li>
</ul>

And here's the way it needs to be ...

<ul outputclass="Spoken_Bullet">
   <li>This is a block of standard text.</li>
   <li>“What did people bring to the picnic? Say <i>Krista</i> with me: <i>Krista</i>. <i>Krista</i> starts with /k/. Can Krista bring ketchup?” 
      <ph outputclass="Student_Response">(Yes. <i>Krista</i> and <i>ketchup</i> both start with /k/.)</ph> “Can Krista bring relish?” 
      <ph outputclass="Student_Response">(No. <i>Relish</i> does not start with /k/.)</ph> “Can Krista bring kale?” 
      <ph outputclass="Student_Response">(Yes. <i>Krista</i> and <i>kale</i> both start with /k/.)</ph>
   </li>
</ul>

So the trick here is not just that I'm converting the elements to tags, which is easy enough, but that these then need to be nested inside of elements, the start and end of which is defined by the parentheses.

For starters, this is the environment I'm working in ...

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="r xs"
    version="2.0">

And after trying a bunch of different unsuccessful methods, this convoluted mess is the closest I've gotten so far ...

<xsl:template match="ph[starts-with(@outputclass, 'Student_Response')]">
  <xsl:choose>
    <xsl:when test="starts-with(text(),'(') and following-sibling::ph">
    <ph outputclass="Student_Response">
    <xsl:value-of select="text()"/>
    <xsl:for-each select="./following-sibling::node()">
      <xsl:choose>
        <xsl:when test="name() = 'ph' and not(ends-with(text(),')'))">
          <xsl:choose>
            <xsl:when test="name() = 'ph' and @outputclass = 'Student_Response_Italic'">
              <xsl:element name="i"><xsl:value-of select="text()"/></xsl:element>
            </xsl:when>
            <xsl:when test="name() = 'ph' and not(@outputclass = 'Student_Response_Italic')">
              <xsl:value-of select="text()"/>
            </xsl:when>
            <xsl:otherwise><xsl:value-of select="."/></xsl:otherwise>
          </xsl:choose>
        </xsl:when>
        <xsl:otherwise><xsl:value-of select="text()"/></xsl:otherwise>
      </xsl:choose>
    </xsl:for-each>
    </ph>
    </xsl:when>
  </xsl:choose>
</xsl:template>

Where this fails, though, is that each of the Student_Response elements ends up containing the contents of the next one, until it winds down to the third having the correct contents (see below). And I understand why this is happening, because xsl:for-each is being told to "loop" through all following content. And so I probably need some kind of xsl:if conditional to make it stop outputting content after it reaches the first end parenthesis. But for the life of me, I've tried twenty ways to build that conditional over the past few days, and each one comes out worse than the last. To the point where I'm not even sure this is the right way to go about it.

<ul outputclass="Spoken_Bullet">
  <li>This is a block of standard text.</li>
  <li>“What did people bring to the picnic? Say <i>Krista</i> with me: <i>Krista</i>. <i>Krista</i> starts with /k/. Can Krista bring ketchup?” 
    <ph outputclass="Student_Response">(Yes. <i>Krista</i> and <i>ketchup</i> both start with /k/.)(No. <i>Relish</i> does not start with /k/.)(Yes. <i>Krista</i> and <i>kale</i> both start with /k/.)</ph> “Can Krista bring relish?” 
    <ph outputclass="Student_Response">(No. <i>Relish</i> does not start with /k/.)(Yes. <i>Krista</i> and <i>kale</i> both start with /k/.)</ph> “Can Krista bring kale?” 
    <ph outputclass="Student_Response">(Yes. <i>Krista</i> and <i>kale</i> both start with /k/.)</ph>
  </li>
</ul>

In other words, please help me!


Solution

  • This seems a task for a nested for-each-group group-starting-with/group-ending-with, here done as XSLT 3:

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      version="3.0"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
      exclude-result-prefixes="#all"
      expand-text="yes">
      
      <xsl:template match="*[ph/@outputclass = 'Student_Response']">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:for-each-group select="node()" group-starting-with="ph[@outputclass = 'Student_Response'][starts-with(., '(')]">
            <xsl:choose>
              <xsl:when test="self::ph[@outputclass = 'Student_Response'][starts-with(., '(')]">
                <xsl:for-each-group select="current-group()" group-ending-with="ph[@outputclass = 'Student_Response'][ends-with(., ')')]">
                  <xsl:choose>
                    <xsl:when test="current-group()[last()][self::ph[@outputclass = 'Student_Response'][ends-with(., ')')]]">
                      <xsl:copy>
                        <xsl:apply-templates select="@*, node(), tail(current-group())"/>
                      </xsl:copy>
                    </xsl:when>
                    <xsl:otherwise>
                      <xsl:apply-templates select="current-group()"/>
                    </xsl:otherwise>
                  </xsl:choose>
                </xsl:for-each-group>
              </xsl:when>
              <xsl:otherwise>
                <xsl:apply-templates select="current-group()"/>
              </xsl:otherwise>
            </xsl:choose>
          </xsl:for-each-group>
        </xsl:copy>
      </xsl:template>
      
      <xsl:template match="ph[@outputclass = 'Student_Response_Italic']">
        <i>
          <xsl:apply-templates/>
        </i>
      </xsl:template>
      
      <xsl:template match="ph[@outputclass = 'Student_Response']">
        <xsl:apply-templates/>
      </xsl:template>
    
      <xsl:mode on-no-match="shallow-copy"/>
      
      <xsl:output indent="no"/>
      
    </xsl:stylesheet>
    

    If you really use an XSLT 2 processor (supported versions of e.g. Saxon are XSLT 3 processors these days) then remove the xsl:mode declaration and replace it by the identity transformation template e.g.

    <xsl:template match="@* | node()">
      <xsl:copy>
        <xsl:apply-templates select="@* | node()"/>
      </xsl:copy>
    </xsl:template>
    

    and replace tail(expression) calls with e.g. expression[position() gt 1] or subsequence(expression, 2).