xmllxmllibxml2xmlcatalog

Using rewrite with lxml


I am generating an XML Schema and then generating data files in Python3.

The generated schema includes a base schema and I use a catalog to change the include URI to a local file. I set the environment variable 'XML_CATALOG_FILES' in Python and this works great.

However, I try to use rewriteSystem in order to use the locally generated schema in place of the generic location reference in the data files and rewrite doesn't seem to work.

Here is the catalog.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.1//EN" "http://www.oasis-open.org/committees/entity/release/1.1/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">

  <!-- S3Model 3.0.0 RM Schema -->
  <uri name="https://www.s3model.com/ns/s3m/s3model_3_0_0.xsd" uri="s3model/s3model_3_0_0.xsd"/>


  <!-- S3Model DMs -->
  <rewriteSystem systemIdStartString="https://dmgen.s3model.com/dmlib/" rewritePrefix="file:///home/tim/DII/Kunteksto/output/"/>
</catalog>

This catalog file does work fine when used in Oxygen using either Xerces or Saxon to validate.

An example reference in the XML file looks like this:

xsi:schemaLocation="https://www.s3model.com/ns/s3m/ https://dmgen.s3model.com/dmlib/dm-a42592f1-e8b3-4862-b6e2-ac0e48c138f4.xsd">

Any ideas why lxml (Libxml2) does recognize this rewriteSystem?


Solution

  • Instead of creating a parser and referencing the schema in the data file.

    I used a different approach by creating a schema object from the schema string in lxml.

        schema_doc = etree.parse(schema)
        modelSchema = etree.XMLSchema(schema_doc)
    

    the variable schema holds the string representation of the XML schema.

    Then as each data document is created it is validated with that schema using:

      try:
         tree = etree.parse(StringIO(xmlStr))
         modelSchema.assertValid(tree)
      except etree.DocumentInvalid:
         file_id = "Invalid_" + file_id
    

    I had to remove the XML declaration:

    <?xml version="1.0" encoding="UTF-8"?>
    

    to get etree.parse too work correctly.