c++xmlxsdxercesxerces-c

Xerces-C validate xml with hardcoded xsd


I'm writing a library which takes xml files and parses them. To prevent users from feeding inalid xmls into my application i'm using xerces to validate the xml files via an xsd.

However, i only manages to validate against xsd-files. Theoretically an user could just open this file and mess around with it. That's why i would like my xsd to be hardcoded in my library.

Unfortunately i haven't found a way to do this with XercesC++, yet.

That's how it is working right now...

bool XmlParser::validateXml(std::string a_XsdFilename)
{
    xercesc::XercesDOMParser  domParser;
    if (domParser.loadGrammar(a_XsdFilename.c_str(), xercesc::Grammar::SchemaGrammarType) == NULL)
    {
        throw Exceptions::Parser::XmlSchemaNotReadableException();
    }

    XercesParserErrorHandler parserErrorHandler;

    domParser.setErrorHandler(&parserErrorHandler);
    domParser.setValidationScheme(xercesc::XercesDOMParser::Val_Always);
    domParser.setDoNamespaces(true);
    domParser.setDoSchema(true);
    domParser.setValidationSchemaFullChecking(true);

    domParser.parse(m_Filename.c_str());

    return (domParser.getErrorCount() == 0);

}

std::string m_Filename is a member variable holding the path of the xml i validate.

std::string a_XsdFilename is the path to the xsd i validate against.

XercesParserErrorHandler inherits from xercesc::ErrorHandler and does error handling.

How can i replace std::string a_XsdFilename with something like std::string a_XsdText? Where std::string a_XsdText contains the schema definition itself instead of a path to a file containing the schema definition.


Solution

  • I'll describe three ways of how to hardcode your XSD in your program:

    Loading the XSD from a file path

    Boris Kolpackov suggests in a blog post that applications should provide the XSD schema files by themselves rather than looking up the schema files through the xsi:schemaLocation or xsi:noNamespaceSchemaLocation attributes found in the XML file.

    In the blog post there is a link to load-grammar-dom , an example program (put in the public domain) that makes use of the xercesc::DOMLSParser::loadGrammar function:

    user@linux:~$ load-grammar-dom
    usage: load-grammar-dom [test.xsd ... ] [test.xml ...]
    user@linux:~$ 
    

    Loading the XSD from a string

    If you would like to pass the XSD file contents as a string, you would need to use another overload of xercesc::DOMLSParser::loadGrammar where you pass

    const DOMLSInput *source

    instead of

    const char *const systemId

    The DOMLSInput could be created with the help of xercesc::MemBufInputSource and xercesc::Wrapper4InputSource like this

    xercesc::Wrapper4InputSource source(
        new xercesc::MemBufInputSource(
           (const XMLByte *) (a_XsdText.c_str()),
        a_XsdText.size(),
        "A name");
    

    (Adapted somewhat from https://stackoverflow.com/a/15829424/757777 but untested)

    Loading the XSD from a precompiled binary

    Included in the software CodeSynthesis XSD the embedded example (that is put in the public domain) demonstrates how to use

    xercesc::BinInputStream and xercesc::XMLGrammarPool::deserializeGrammars

    to load a precompiled XSD schema.

    See also README.

    The example contains the program xsdbin that compiles XSD schema files into a binary file.

    user@linux:~$ xsdbin --help
    Usage: xsdbin [options] <files>
    Options:
      --help                 Print usage information and exit.
      --verbose              Print progress information.
      --output-dir <dir>     Write generated files to <dir>.
      --hxx-suffix <sfx>     Header file suffix instead of '-schema.hxx'.
      --cxx-suffix <sfx>     Source file suffix instead of '-schema.cxx'.
      --array-name <name>    Binary data array name.
      --disable-multi-import Disable multiple import support.
    user@linux:~$
    

    In the makefile the XSD schema file is precompiled by xsdbin and the result ends up inside the example executable.