xmlxsltindentationpre

How to ignore the n first tabulations in an XSLT template


The situation

In my XML files, I could have peace of code to show inside the tag <code>. But the indentation of my XML document is in conflict with the tabulation inside the <code> section.

Minimal Working Example

The XML file
        <article>
            <code lang="c">
            #include &lt;stdio.h&gt;
            int main() {
                // printf() displays the string inside quotation
                printf("Hello, World!");
                return 0;
            }
            </code>
        </article>
The peace of XSLT
    <xsl:template match="code">
        <pre><xsl:value-of select="."/></pre>
    </xsl:template>
The expected HTML rendering
<pre>   #include &lt;stdio.h&gt;
    int main() {
    // printf() displays the string inside quotation
    printf("Hello, World!");
    return 0;
}</pre>

Explanations

As you see, the goal is to ignore the n first tabulations and the n last tabulation (if any) inside the tags, when n is equal to the number of tabulation before the opening tag <code>. And also to ignore the first new line, and the last new line (the one just before the tabulations before the closing </code> tag).

More explanations

According to @michael.hor257k suggestion to bring more clarifications, in other terms, the XSLT style sheet should treat the XML <code> part shown above like if it was like this:

        <article>
            <code lang="c">#include &lt;stdio.h&gt;
int main() {
    // printf() displays the string inside quotation
    printf("Hello, World!");
    return 0;
}</code>
        </article>

As you see the tabs bellonging to the XML indentation should not be included in the final HTML <pre> tag.

In more graphical way, we can say that the tabs corresponding to the tabs commented bellow should be ignored in the processing:

        <article>
            <code lang="c"><!--
         -->#include &lt;stdio.h&gt;
<!--     -->int main() {
<!--     -->    // printf() displays the string inside quotation
<!--     -->    printf("Hello, World!");
<!--     -->    return 0;
<!--     -->}<!--
         --></code>
        </article>

An this spaces, tabs, and new lines are corresponding to the XML indentation and not to the internal C code indentation.

Conclusion — Question

So, is it possible in my XSLT to parse the number of tabs before the opening <code> tag in order to delete them from the beginning of each content’s line?


Solution

  • Try perhaps something like:

    XSLT 1.0 + EXSLT

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:str="http://exslt.org/strings"
    extension-element-prefixes="str">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    
    <!-- identity transform -->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    
    <xsl:template match="code">
        <xsl:variable name="indent" select="substring-after(preceding-sibling::text(), '&#10;')" />
        <pre>
            <xsl:for-each select="str:tokenize(., '&#10;')[normalize-space()]">
                <xsl:value-of select="substring-after(., $indent)"/>
                <xsl:if test="position()!=last()">
                    <xsl:text>&#10;</xsl:text>
                </xsl:if>
            </xsl:for-each>
        </pre>
    </xsl:template>
    
    </xsl:stylesheet>
    

    Note that there are some assumptions here that your example satisfies, but other cases may not.