xquerymarklogicmarklogic-8

Normalize space in each element of XML using XQuery


I am having am XML like this -

<a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
    <c:id>
        http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
    <c:type>series-item</c:type>
    <f:assessment-low>8.946586935</f:assessment-low>
    <f:assessment-high>9.946586935</f:assessment-high>
    <f:mid>9.44658693500000000000</f:mid>
    <f:period-label>
        <c:l10n xml:lang="en"/>
    </f:period-label>
</a:price-range>

I want to normalise the space in the XML. Like in above example, there are spaces in c:id element. After normalising spaces, above XML will look like -

<a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
    <c:id>http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
    <c:type>series-item</c:type>
    <f:assessment-low>8.946586935</f:assessment-low>
    <f:assessment-high>9.946586935</f:assessment-high>
    <f:mid>9.44658693500000000000</f:mid>
    <f:period-label>
        <c:l10n xml:lang="en"/>
    </f:period-label>
</a:price-range>

I had a look at fn:normalise-space, but it work for strings only.


Solution

  • This function worked fine for me -

    (:
      The rules/assumptions are:
      #1 Retain one leading space if the node isn't first, has non-space content, and has leading space.
      #2 Retain one trailing space if the node isn't last, isn't first, and has trailing space. 
      #3 Retain one trailing space if the node isn't last, is first, has trailing space, and has non-space content.
      #4 Retain a single space if the node is an only child and only has space content.
      :)
      declare function local:normalize-space-in-xml($input)
      {
         element {node-name($input)}
           {$input/@*,
             for $child in $input/node()
             return
               if ($child instance of element())
               then local:normalize-space-in-xml($child)
               else
                 if ($child instance of text())
                 then
                   (:#1 Retain one leading space if node isn't first, has non-space content, and has leading space:)
                   if ($child/position() ne 1 and matches($child,'^\s') and normalize-space($child) ne '')
                   then (' ', normalize-space($child))
                   else
                     (:#4 retain one space, if the node is an only child, and has content but it's all space:)
                     if ($child/last() eq 1 and string-length($child) ne 0 and normalize-space($child) eq '')
                     (: this overrules standard normalization:)
                     then ' '
                     else
                       (:#2 if the node isn't last, isn't first, and has trailing space, retain trailing space and collapse and trim the rest:)
                       if ($child/position() ne 1 and $child/position() ne last() and matches($child,'\s$'))
                       then (normalize-space($child), ' ')
                       else
                         (:#3 if the node isn't last, is first, has trailing space, and has non-space content, then keep trailing space:)
                         if ($child/position() eq 1 and matches($child,'\s$') and normalize-space($child) ne '')
                         then (normalize-space($child), ' ')
                         (:if the node is an only child, and has content which is not all space, then trim and collapse, that is, apply standard normalization:)
                         else normalize-space($child)
                  else $child
          }
      };