xmlcanonicalization

XML Canonicalization lexicographical ordering


There are official test cases for XML canonicalization which can be found here: Test cases for Canonical XML 2.0

One of them looks like this:

<!DOCTYPE doc [<!ATTLIST e9 attr CDATA "default">]>
<doc>
   <e1   />
   <e2   ></e2>
   <e3   name = "elem3"   id="elem3"   />
   <e4   name="elem4"   id="elem4"   ></e4>
   <e5 a:attr="out" b:attr="sorted" attr2="all" attr="I'm"
      xmlns:b="http://www.ietf.org"
      xmlns:a="http://www.w3.org"
      xmlns="http://example.org"/>
   <e6 xmlns="" xmlns:a="http://www.w3.org">
      <e7 xmlns="http://www.ietf.org">
         <e8 xmlns="" xmlns:a="http://www.w3.org">
            <e9 xmlns="" xmlns:a="http://www.ietf.org"/>
         </e8>
      </e7>
   </e6>
</doc> 

The given canonicalized form is

<doc>
   <e1></e1>
   <e2></e2>
   <e3 id="elem3" name="elem3"></e3>
   <e4 id="elem4" name="elem4"></e4>
   <e5 xmlns="http://example.org" xmlns:a="http://www.w3.org" xmlns:b="http://www.ietf.org" attr="I'm" attr2="all" b:attr="sorted" a:attr="out"></e5>
   <e6>
      <e7 xmlns="http://www.ietf.org">
         <e8 xmlns="">
            <e9 attr="default"></e9>
         </e8>
      </e7>
   </e6>
</doc>

I'm wondering why b:attr="sorted" comes before a:attr="out" in the sorted output... I'd be really thankful if someone could clarify this for me.


Solution

  • Don't look at the namespace prefixes; look at the namespace URIs.

    Although a comes before b, i comes before w:

      xmlns:b="http://www.ietf.org"
      xmlns:a="http://www.w3.org"
    

    Therefore b:attr="sorted" comes before a:attr="out" canonically.

    This is explained in section 2.3:

    Note: In e5, b:attr precedes a:attr because the primary key is namespace URI not namespace prefix