javaxmlxsdsaxparserxerces2-j

What do Public Identifier, System Identifier, and Base system identifier refer to in XML?


The Xerces2-j XMLInputSource, and also SAX InputSource, refer to public and system identifiers. Xerces2-J XMLInputSource also refers to a base system identifier.

What do these identifiers represent?

Edit: Xerces-J, when give a file location as the SystemId, will open the file as input. If the input is provided as a byte stream instead from some other source such as a database, is there any purpose to the public or system id?


Solution

  • If you look at the XML syntax, you will see, for example that external entity references use the syntax:

    ExternalID ::= 'SYSTEM' S SystemLiteral
      | 'PUBLIC' S PubidLiteral S SystemLiteral
    

    Here's an example of this syntax in use:

    <!ENTITY open-hatch
             PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN"
             "http://www.textuality.com/boilerplate/OpenHatch.xml">
    

    References to DTDs work in the same way (in fact, external DTDs are technically-speaking one kind of entity).

    The "system identifier" is a URI that identifies where the text of an entity can be found. The "public identifier" (a hangover from SGML) is more like a name for the resource; it only helps you find the resource if you have some kind of index or catalog that tells you where to look.

    System identifiers are often given as relative URI references (for example "books.dtd") which need to be resolved relative to a base URI. The base URI is generally the location where the containing resource (or entity) was found. For example, if an XML document is at http://my.com/lib/books.xml then its base URI is http://my.com/lib/ and the relative URI books.dtd is then expanded to http://my.com/lib/books.dtd.

    In answer to your question "is there any purpose to the public or system id" the answer is no if the document consists entirely of a single entity (which is often the case). But as soon as multiple entities come into play, you need identifiers to link them together.