Remove Duplicate Namespaces from XML in Java

I have the following soap response as a sample:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:urn="urn:mycompany:Service:2" xmlns:urn1="urn:mycompany:Customer:2">
  <soapenv:Header />
  <soapenv:Body>
    <urn:GetResponse>
      <urn:StatusCode>002</urn:StatusCode>
      <urn:StatusMessage>Pass</urn:StatusMessage>
      <urn:CustomerAffiliations>
        <urn:CustomerAffiliation>
          <urn:CustomerID>II39642</urn:CustomerID>
          <urn:CustomerContactDetails>
            <ns3:Channel xmlns:ns3="urn:mycompany:Customer:2">Business Phone</ns3:Channel>
            <ns3:Value xmlns:ns3="urn:mycompany:Customer:2">5553647</ns3:Value>
          </urn:CustomerContactDetails>
        </urn:CustomerAffiliation>
      </urn:CustomerAffiliations>
    </urn:GetResponse>
  </soapenv:Body>
</soapenv:Envelope>

urn:mycompany:Customer:2 has been included as urn1 in soapenv:Envelope but it is duplicated in ns3:Channel and ns3:Value.

The requirement is to clean the xml content so the correct namespaces declared in soapenv:Envelope is used in the child elements.

Is there a way in Java to clean/normalize this xml content and use proper namespace usage and duplication removal?

Solution

The following code will replace "duplicated" namespaces with their inherited versions for elements only (Attributes could have their own namespaces too)....

Note that this has some horrible time-complexity, so for larger XML documents this could degenerate quite badly.... so don't use this on deeply nested, or documents larger than a few hundred elements... at some point the time complexity will bite you.

On the other hand, for small packets like your SOAP example, it will be more than enough...

private static final Namespace findFirst(List<Namespace> namespaces, String uri) {
    for (Namespace ns : namespaces) {
        if (ns.getURI().equals(uri)) {
            return ns;
        }
    }
    return null;
}


public static final void dedupElementNamespaces(Element node) {
    List<Namespace> created = node.getNamespacesIntroduced();
    if (!created.isEmpty()) {
        // check anything new against other stuff...
        List<Namespace> inherited = node.getNamespacesInherited();
        // check out element against previous declarations....
        if (node.getNamespace().getPrefix() != "") {
            // never swap defaulted namespaces to anything with a prefix.
            Namespace ens = node.getNamespace();
            Namespace use = findFirst(inherited, node.getNamespaceURI());
            if (use != null && use != ens) {
                node.setNamespace(use);
            }
        }           

    }
    for (Element e : node.getChildren()) {
        dedupElementNamespaces(e);
    }
}

You can call that with:

dedupElementNamespaces(doc.getRootElement());

The methods node.getNamespacesIntroduced() and node.getNamespacesInherited() compute lists dynamically by scanning up the XML hierarchy... so their performance depends on the depth of nesting. See https://github.com/hunterhacker/jdom/blob/master/core/src/java/org/jdom2/Element.java#L1753