javaxmlapachexsd-validationxerces2-j

Is Xerces-J correct to fail lax validation for an unresolved xsi:type?


I am having a problem validating this SOAP Envelope using this snippet of code (below).

The error that I get is:

org.xml.sax.SAXParseException; cvc-elt.4.2: Cannot resolve 'ipo:UKAddress' to a type definition for element 'shipTo'.

SOAP XSD defines the Body as:

<xs:complexType name="Body">
<xs:sequence>
<xs:any namespace="##any" minOccurs="0" maxOccurs="unbounded" processContents="lax"/>
</xs:sequence>

My expectation is that "lax" should validate if it has a definition, but ignore if it does not. However, that is not the case with respect to the xsi:type="ipo:UKAddress". I am only validating the SOAP Envelope - not the Body.

It looks like a bug in xerces-j. In the same chunk of code, XMLSchemaValidator.java:2152 actually checks processContents before raising an error:

else if (wildcard != null && wildcard.fProcessContents == XSWildcardDecl.PC_STRICT) {

Whereas, XMLSchemaValidator.java:2178 makes no such check and will throw no matter what.

fCurrentType = getAndCheckXsiType(element, xsiType, attributes);

To me, it looks like a bug in xerces-j. Also, this problem exists in Java 8. Any help, or confirmation that this is indeed a bug is appreciated.

package com.example.xmlvalidate;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import java.security.CodeSource;

import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;

import org.w3c.dom.Document;
import org.xml.sax.SAXException;

public class Validate {
    private static final String envelope =
            "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n" +
            "<soapenv:Envelope \n" +
            "  xmlns=\"http://www.w3.org/2001/XMLSchema\"" +
            "  xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n" +
            "  xmlns:soapenv=\"http://schemas.xmlsoap.org/soap/envelope/\"\n" +
            "  >\n" +
            "  <soapenv:Body>\n" +
            "    <ipo:purchaseOrder xmlns:ipo=\"http://www.example.com/IPO\">\n" +
            "      <shipTo exportCode=\"1\" xsi:type=\"ipo:UKAddress\">\n" +
            "        <name>Helen Zoe</name>\n" +
            "        <street>47 Eden Street</street>\n" +
            "        <city>Cambridge</city>\n" +
            "        <postcode>CB1 1JR</postcode>\n" +
            "      </shipTo>\n" +
            "    </ipo:purchaseOrder>\n" +
            "  </soapenv:Body>\n" +
            "</soapenv:Envelope>";

    private static final String SOAP_1_1_ENVELOPE =
            "http://schemas.xmlsoap.org/soap/envelope";
    protected static final String W3C_XML_SCHEMA =
            "http://www.w3.org/2001/XMLSchema";

    public static void validate() throws ParserConfigurationException, SAXException, IOException, TransformerException {
        final DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();

        documentBuilderFactory.setNamespaceAware(true);
        documentBuilderFactory.setValidating(false);

        final Class<?> clazz = documentBuilderFactory.getClass();
        final CodeSource source = clazz.getProtectionDomain().getCodeSource();
        System.out.println("Document builder implementation: " + clazz.getName() + " from : " + (source == null ? "JRE" : source));

        final DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
        final InputStream is = new ByteArrayInputStream(envelope.getBytes(StandardCharsets.UTF_8));
        final Document document = documentBuilder.parse(is);
        final DOMSource domSource = new DOMSource(document);

        final StreamSource streamSource = new StreamSource(new URL(SOAP_1_1_ENVELOPE).openStream());
        final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        final Schema schema = schemaFactory.newSchema(streamSource);

        final Validator validator = schema.newValidator();
        validator.validate(domSource);
    }
}

Solution

  • Short answer: Xerces is either unquestionably correct or else not provably wrong, in this case.

    Long answer:

    You don't specify whether you are using XSD 1.0 or XSD 1.1.

    In 1.0, the spec is slightly unclear about the effect on validity of an xsi:type attribute whose QName value does not resolve to a type definition in the schema. One natural reading of Validation Rule: Schema-Validity Assessment (Element) in 3.3.4 is that when xsi:type occurs, it must resolve (note the slight confusion in the text between conformance requirements and validity requirements). Another reading of the rule says that if clause 1.2.1.2.3 of the validation rule don't apply, then clearly clauses 1.2.1.2, 1.2, and 1 don't apply, which leads to the conclusion that the element should be laxly assessed.

    The same two readings apply to clause 4.2 of Validation Rule: Element Locally Valid (Element) in the same section. That clause says the value of xsi:type "must resolve to a type definition", which means either that the element is invalid if the xsi:type value doesn't resolve, or (on a different reading of the rule) that in that case the element is clearly not (known to be) locally valid against the type indicated.

    In 1.1, the rules have been rewritten and perhaps made clearer. If the value of xsi:type is a QName which doesn't resolve to a type definition, then a fallback type is calculated, and the element is validated against the fallback type; in the case you appear to have in mind, that type will be xsd:anyType. But 1.1 also makes very explicit that in that case the xsi:type attribute itself is invalid (clause 5 of Validation Rule: Attribute Locally Valid in 3.2.4.

    So under the rules of XSD 1.1, it's clear that Xerces is correctly flagging the input as invalid, though the error code might more plausibly be a different one.

    If you are operating with XSD 1.0, it's clear from the error code that Xerces is taking the first view of the non-resolving xsi:type value, and treating it as a validity error. I think it is hard to prove from the specification text that that's the only possible interpretation, but it would be even harder to prove it wrong: it's clearly a plausible interpretation of the spec. If you want problems with xsi:type to be ignored and not treated as validity errors, you want a skip wildcard, not a lax wildcard. (You can of course declare your own element wrapper for the SOAP payload, declaring it with a skip wildcard in its content model, and thus force the validation behavior you want.)