sortingxsltxslt-2.0

Sorting numbers mixed with text in xslt


I need to sort a set of documents using attribute 'd' in ascending order. But in this attributes numbers mixed with letters.

Attributes could be like: d="11A-1-000003" d="11-1-000008a" d="11-16-000009" d="11-1C-000008" d="11-9-000002" d="12-1-000008a" d="11-15-00014" d="13-1-000007a" d="11-15B-00014a" d="11-24-00043a" d="11-3-000023" d="11-3-000023a" d="11-3-000023b"

I tried different solutions, but have no luck, order is not correct.

<xsl:sort select="normalize-space(@d)" data-type="text" order="ascending" case-order="upper-first"/>

<xsl:sort select="replace(normalize-space(@d), '[^\d]', '')" data-type="number" order="ascending"/>

<xsl:sort select="substring-before(normalize-space(@d), '-')" data-type="number"/>
<xsl:sort select="substring-before(substring-after(normalize-space(@d), '-'), '-')" data-type="number"/>
<xsl:sort select="substring-after(substring-after(normalize-space(@d), '-'), '-')" data-type="number"/>
<xsl:sort select="substring-before(normalize-space(@d), '-')" data-type="text"/>
<xsl:sort select="substring-before(substring-after(normalize-space(@d), '-'), '-')" data-type="text"/>
<xsl:sort select="substring-after(substring-after(normalize-space(@d), '-'), '-')" data-type="text"/>
<xsl:sort select="number(tokenize(@d, '-')[1])" data-type="number"/>
<xsl:sort select="number(tokenize(@d, '-')[2])" data-type="number"/>
<xsl:sort select="number(tokenize(@d, '-')[3])" data-type="number"/>
<xsl:sort select="tokenize(normalize-space(@d), '-')[1]"/>
<xsl:sort select="tokenize(normalize-space(@d), '-')[2]"/>
<xsl:sort select="tokenize(normalize-space(@d), '-')[3]"/>

Actual result is that: 11-3-000023 is after 11-24-00043a but should be after 11-2, 11-1C-000008 is after 11-15B-000008 but should be after 11-1

The expected result is that numbers should have numbers should take precedence over letters. Numbers are chapters, letters are subchapters.

As an example expected result is: d="11-1-000008a" d="11-1C-000008" d="11-3-000023" d="11-3-000023a" d="11-3-000023b" d="11-9-000002" d="11-15-00014" d="11-15B-00014a" d="11-16-000009" d="11-24-00043a" d="11A-1-000003" d="12-1-000008a" d="13-1-000007a"


Solution

  • Here's a solution which uses a regular expression to parse the @d values into six separate tokens (numeric, and non-numeric), and sorts the documents with a separate xsl:sort for each one of those tokens.

    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      
      <xsl:output method="xml" indent="true"/>
    
      <xsl:template match="documents">
        <xsl:copy>
          <!-- regular expression parses @d values into 6 tokens:
          a numeric token, an optional non-numeric token, an ignored hyphen, 
          a numeric token, an optional non-numeric token, an ignored hyphen, 
          a numeric token, an optional non-numeric token. 
           -->
          <xsl:variable name="parser">(\d+)([^\d]*)-(\d+)([^\d]*)-(\d+)([^\d]*)</xsl:variable>
          <xsl:perform-sort select="*">
            <xsl:sort select="replace(@d, $parser, '$1')" data-type="number"/>
            <xsl:sort select="replace(@d, $parser, '$2')"/>
            <xsl:sort select="replace(@d, $parser, '$3')" data-type="number"/>
            <xsl:sort select="replace(@d, $parser, '$4')"/>
            <xsl:sort select="replace(@d, $parser, '$5')" data-type="number"/>
            <xsl:sort select="replace(@d, $parser, '$6')"/>
          </xsl:perform-sort>
        </xsl:copy>
      </xsl:template>
    
    </xsl:stylesheet>
    

    Input:

    <documents>
       <document d="11A-1-000003"/>
       <document d="11-1-000008a"/>
       <document d="11-16-000009"/>
       <document d="11-1C-000008"/>
       <document d="11-9-000002"/>
       <document d="12-1-000008a"/>
       <document d="11-15-00014"/>
       <document d="13-1-000007a"/>
       <document d="11-15B-00014a"/>
       <document d="11-24-00043a"/>
       <document d="11-3-000023"/>
       <document d="11-3-000023a"/>
       <document d="11-3-000023b"/>
    </documents>
    

    Output

    <documents>
       <document d="11-1-000008a"/>
       <document d="11-1C-000008"/>
       <document d="11-3-000023"/>
       <document d="11-3-000023a"/>
       <document d="11-3-000023b"/>
       <document d="11-9-000002"/>
       <document d="11-15-00014"/>
       <document d="11-15B-00014a"/>
       <document d="11-16-000009"/>
       <document d="11-24-00043a"/>
       <document d="11A-1-000003"/>
       <document d="12-1-000008a"/>
       <document d="13-1-000007a"/>
    </documents>