xmlxsltxslt-1.0processing-instruction

xslt matching for text between xml processing-instructions


Given the following xml:

<items>
<item id="item1">
    <description id="desc">
        <?RELAPP description="Relative" loc="start"?>
        <heading id="h1" level="1">HEADING 1</heading>
        <p id="p2" num="1">Paragraph A</p>
        <?RELAPP description="Relative" loc="end"?>
        <?SUMM description="Summary" loc="start"?>
        <heading id="h2" level="1">HEADING 2</heading>
        <p id="p3" num="2">Paragraph B</p>
        <p id="p4" num="3">Paragraph C</p>
        <heading id="h3" level="1">HEADING 3</heading>
        <p id="p5" num="4">Paragraph D</p>
        <p id="p6" num="5">Paragraph E</p>
        <?SUMM description="Summary" loc="end"?>
        <?drawings description="Drawings" loc="start"?>
        <drawings>
            <heading id="h4" level="1">HEADING 4</heading>
            <p id="p7" num="6">Paragraph F</p>
            <p id="p8" num="7">Paragraph G</p>          
        </drawings>
        <?drawings description="Drawings" loc="end"?>
    </description>
</item> 
</items>

I'm trying to get to the text between:

<?SUMM description="Summary" loc="start"?>

and

<?SUMM description="Summary" loc="end"?>

That is:

HEADING 2 Paragraph B Paragraph C HEADING 3 Paragraph D Paragraph E

hopefully with some separation between the Headings and Paragraphs.

The best xsl I've been able to come up with is:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/> 
<xsl:template match="/items">
    <myItems>
        <xsl:apply-templates/>
    </myItems>
</xsl:template> 

<xsl:template match="item">
    <xsl:element name="info">
        <xsl:element name="summaryPI">          
            <xsl:for-each select="description/processing-instruction('SUMM')">
                <xsl:value-of select="."/>
            </xsl:for-each>         
        </xsl:element>
    </xsl:element>
</xsl:template>
</xsl:stylesheet>

but it only gets me this:

<?xml version="1.0" encoding="UTF-8"?>
 <myItems>
  <info>
   <summaryPI>description="Summary" loc="start"description="Summary" loc="end"</summaryPI>
  </info>
</myItems>

What rule should I use to get the text I want? I tried with preceding-sibling and following-sibling but I couldn't get it to work. I'm using version 1.0.


Solution

  • How about:

    XSLT 1.0

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" encoding="UTF-8"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:template match="/">
        <xsl:for-each select="//text()[preceding::processing-instruction('SUMM')[contains(., 'loc=&quot;start&quot;')]]
                                      [following::processing-instruction('SUMM')[contains(., 'loc=&quot;end&quot;')]] ">
            <xsl:value-of select="." />
            <xsl:if test="position()!=last()">
                <xsl:text>, </xsl:text>
            </xsl:if>   
        </xsl:for-each>
    </xsl:template>
    
    </xsl:stylesheet>
    

    Applied to your input example, the result will be:

    HEADING 2, Paragraph B, Paragraph C, HEADING 3, Paragraph D, Paragraph E
    

    Note: if it can be assumed that all the nodes in-between the two processing instructions are siblings (as they are in your example), then this could be made a little more efficient by using:

    <xsl:for-each select="//*[preceding-sibling::processing-instruction('SUMM')[contains(., 'loc=&quot;start&quot;')]]
                             [following-sibling::processing-instruction('SUMM')[contains(., 'loc=&quot;end&quot;')]] ">