phpdomxinclude

Does a PHP supports recursively processing of xi:include elements?


What is the best Php code to load an XML file with a correct XInclude syntax (with XPointer) and include XML recursively?

Example (the Xinclude syntax should be correct): index.xml includes legal/sitemap.xml:

<?xml version="1.0" encoding="UTF-8"?>
<urlset>
    <url>
        <loc>/privacy/</loc>
        <query>/?template=home&amp;content=home</query>
    </url>
    <xi:include href="legal/sitemap.xml" xpointer="xpointer(//urlset/*)"/>
</urlset>

legal/sitemap.xml includes a node value of legal/cookies.xml

<urlset>
    <url>
        <loc>/cookies/</loc>
        <query>/?template=page&amp;content=cookies</query>
        <lastmod><xi:include href="cookies.xml" xpointer="xpointer(//*[1]/datePublished/text())"/></lastmod>
    </url>
</urlset>

legal/cookies.xml

<?xml version="1.0" encoding="UTF-8"?>
<section xml:id="php" class="page">
    <title>Cookies</title>
    <datePublished>2018-11-28T12:02:41Z</datePublished>
</section>

Output the complete XML code, with 1st and 2nd level inclusions.

<?xml version="1.0" encoding="UTF-8"?>
<urlset>
    <url>
        <loc>/privacy/</loc>
        <query>/?template=home&amp;content=home</query>
        <lastmod>2017-11-29T12:02:30Z</lastmod>
    </url>
    <url>
        <loc>/cookies/</loc>
        <query>/?template=page&amp;content=cookies</query>
        <lastmod>2018-11-28T12:02:41Z</lastmod>
    </url>
</urlset>

PHP DOMDocument::xinclude parses correctly 1st level includes (i.e. index.xml includes legal/sitemap.xml), but doesn't parse >2nd level includes, (the node value of legal/cookies.xml is not parsed, keeping the ‘xi:include’ in legal/sitemap.xml). This is the output:

<?xml version="1.0" encoding="UTF-8"?>
<urlset>
    <url>
        <loc>/privacy/</loc>
        <query>/?template=home&amp;content=home</query>
        <lastmod><xi:include href="cookies.xml" xpointer="xpointer(//*[1]/datePublished/text())"/></lastmod>
    </url>
    <url>
        <loc>/cookies/</loc>
        <query>/?template=page&amp;content=cookies</query>
        <lastmod>2018-11-28T12:02:41Z</lastmod>
    </url>
</urlset>

Parsing two times with ‘$DOMDocument->xinclude();’ returns the same output.


Solution

  • The only thing I can find missing is that you have to make sure that in both the index.xml and the sitemap.xml, you need to have the xi namespace declared in the document, so with

    index.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns:xi="http://www.w3.org/2001/XInclude">
        <url>
            <loc>/privacy/</loc>
            <query>/?template=home&amp;content=home</query>
        </url>
        <xi:include href="legal/sitemap.xml" xpointer="xpointer(//urlset/*)"/>
    </urlset>
    

    sitemap.xml

    <urlset xmlns:xi="http://www.w3.org/2001/XInclude">
        <url>
            <loc>/cookies/</loc>
            <query>/?template=page&amp;content=cookies</query>
            <lastmod><xi:include href="cookies.xml" xpointer="xpointer(//*[1]/datePublished/text())"/></lastmod>
        </url>
    </urlset>
    

    cookies.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <section xml:id="php" class="page">
        <title>Cookies</title>
        <datePublished>2018-11-28T12:02:41Z</datePublished>
    </section>
    

    and the code...

    $xml = new DOMDocument();
    $xml->load("index.xml");
    $xml->xinclude();
    echo $xml->saveXML();
    

    you end up with

    <?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns:xi="http://www.w3.org/2001/XInclude">
        <url>
            <loc>/privacy/</loc>
            <query>/?template=home&amp;content=home</query>
        </url>
        <url xmlns:xi="http://www.w3.org/2001/XInclude" xml:base="legal/sitemap.xml">
            <loc>/cookies/</loc>
            <query>/?template=page&amp;content=cookies</query>
            <lastmod>2018-11-28T12:02:41Z</lastmod>
        </url>
    </urlset>