xsltxslt-3.0accumulator

Using xsl:accumulator to keep track of text nodes between two PIs


I am learning about accumulators in XSLT 3.0 but I do not find any examples that help me solve my current problem. I have large files in which processing instructions are used to mark modifications. I need to process these into visible markers for the review process. With an accumulator I have succeeded to keep track of the latest modification code to be shown. So far, so good.

As the original files are massive, I created a simple sample input XML that shows the essence of my task and I adapted my XSL to show what I am trying with the accumulator.

Simple input file:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <div>
        <p>Paragraph 1</p>
        <?MyPI Start Modification 1?>
        <p>Paragraph 2</p>
        <p>Paragraph 3</p>
        <?MyPI End Modification 1?>
    </div>
    <div>
        <list>
            <item>
                <p>Paragraph 4</p>
                <?MyPI Start Modification 1?>
                <p>Paragraph 5</p>
                <?MyPI End Modification 1?>
            </item>
            <item>
                <?MyPI Start Modification 1?>
                <p>Paragraph 6</p>
                <p>Paragraph 7</p>
                <?MyPI End Modification 1?>
                <?MyPI Start Modification 2?>
                <p>Paragraph 8</p>
                <?MyPI End Modification 2?>
            </item>
        </list>
        <p>Paragraph 9</p>
    </div>
</root>

My XSL using an accumulator for the current modification:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="3.0">
    
    <xsl:mode use-accumulators="#all"/>
    
    <xsl:accumulator name="modifier" initial-value="'Base text'">
        <xsl:accumulator-rule match="processing-instruction('MyPI')[contains(.,'Modification')]">           
            <xsl:choose>
                <xsl:when test="contains(.,'Start')">
                    <xsl:value-of select="substring-after(.,'Start ')"/>
                </xsl:when>
                <xsl:otherwise>Base text</xsl:otherwise>
            </xsl:choose>
        </xsl:accumulator-rule>
    </xsl:accumulator>

    <xsl:template match="/">
        <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="node()">
        <xsl:copy>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="processing-instruction('MyPI')">
        <marker>
            <xsl:value-of select="accumulator-after('modifier')"/>
        </marker>
    </xsl:template>

</xsl:stylesheet>

Output with this XSL:

<?xml version="1.0" encoding="UTF-8"?><root>
    <div>
        <p>Paragraph 1</p>
        <marker>Modification 1</marker>
        <p>Paragraph 2</p>
        <p>Paragraph 3</p>
        <marker>Base text</marker>
    </div>
    <div>
        <list>
            <item>
                <p>Paragraph 4</p>
                <marker>Modification 1</marker>
                <p>Paragraph 5</p>
                <marker>Base text</marker>
            </item>
            <item>
                <marker>Modification 1</marker>
                <p>Paragraph 6</p>
                <p>Paragraph 7</p>
                <marker>Base text</marker>
                <marker>Modification 2</marker>
                <p>Paragraph 8</p>
                <marker>Base text</marker>
            </item>
        </list>
        <p>Paragraph 9</p>
    </div>
</root>

The problem I have is that closing and opening markers for the same modification code should be hidden when there is no text between them. They may be immediately following each other (which is fairly simple) but also have some non-text element boundaries between them. I have tried to create an accumulator that keeps track of all text since the last modification marker, but that causes nested calls to the same accumulator which gives a runtime error. What I am looking for is a method that keeps adding text to an accumulator and resets it to an empty string when a modification PI is found. This is my trial accumulator that caused too many nested calls:

<xsl:accumulator name="text" initial-value="''">
    <xsl:accumulator-rule match="node()">
        <xsl:choose>
            <xsl:when test="self::processing-instruction('MyPI')"/>
            <xsl:when test="self::text()">
                <xsl:value-of select="concat(accumulator-after('text'),.)"/>
            </xsl:when>
        </xsl:choose>
    </xsl:accumulator-rule>
</xsl:accumulator>

I guess I do not yet understand how the accumulator works, which makes it hard to get the result I am looking for.

Required output for the above simple XML:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <div>
        <marker>Base text</marker>
        <p>Paragraph 1</p>
        <marker>Modification 1</marker>
        <p>Paragraph 2</p>
        <p>Paragraph 3</p>
        <marker>Base text</marker>
    </div>
    <div>
        <list>
            <item>
                <p>Paragraph 4</p>
                <marker>Modification 1</marker>
                <p>Paragraph 5</p>
            </item>
            <item>
                <p>Paragraph 6</p>
                <p>Paragraph 7</p>
                <marker>Mpdification 2</marker>
                <p>Paragraph 8</p>
            </item>
        </list>
        <marker>Base text</marker>
        <p>Paragraph 9</p>
    </div>
</root>

Hoping someone can point me in the right direction. I guess accumulating text nodes since a particular node in the XML processing would be a problem that more people need to solve. In my current case I do not need the actual text content, I just need to know if there is any visible text since the last PI (i.e. I need to remove or disregard any whitespace in this check).

If there is another method that does not involve accumulators, that would be fine, too.

Thanks in advance for any help


Solution

  • Perhaps

    <xsl:accumulator name="text" initial-value="()" as="xs:string?">
        <xsl:accumulator-rule match="processing-instruction('MyPI')" select="''"/>
        <xsl:accumulator-rule match="text()[normalize-space()]" select="$value || ."/>
    </xsl:accumulator>
    

    gives you an example on how to set up an accumulator to collect text node values, I am not sure I have understood the conditions to reset the accumulator to an empty string, so that is basically the match from your sample, just transcribed in (hopefully) compilable XSLT 3 you can adapt if there are more conditions relative to start or end processing instruction pairs or names.

    As for the spec explaining the $value variable, see https://www.w3.org/TR/xslt-30/#accumulator-declaration:

    The select attribute and the contained sequence constructor of the xsl:accumulator-rule element are mutually exclusive: if the select attribute is present then the sequence constructor must be empty. The expression in the select attribute of xsl:accumulator-rule or the contained sequence constructor is evaluated with a static context that follows the normal rules for expressions in stylesheets, except that:

    An additional variable is present in the context. The name of this variable is value (in no namespace), and its type is the type that appears in the as attribute of the xsl:accumulator declaration.

    The context item for evaluation of the expression or sequence constructor will always be a node that matches the pattern in the match attribute.

    and two of the examples in https://www.w3.org/TR/xslt-30/#accumulator-examples also use $value.