phpxmlxml-parsingsimplexmldublin-core

How can I access the infomation in the 'Dublin Core' namespace received from external XML?



For the last day I struggled with some XML parsing in PHP. I use an external service to provide me with information about books based on an ISBN as search term via XML (A service provided by the German National Library which requires to include a private token in the request (This is not the cause of the problem, I've already checked that) -> https://www.dnb.de/EN/Professionell/Metadatendienste/Datenbezug/SRU/sru_node.html | And I have also checked that 'allow_url_fopen' is enabled in the php.ini).

Now, my problem is that whatever method for XML parsing I use the necessary book information is not displayed and accesible for me to work with in the Simple XML Element Object (see the result of the second 'echo' from my code below in this screenshot). If I first pull the XML as a string, the information is visible and accesible (see the result of the first 'echo' from my code below in this screenshot). The goal would be to be able to access the information about the books based on their element names (dc:title, dc:creator, dc:publisher, dc:date, etc.) individually. In my current piece of code this is not possible as PHP will tell me: "Warning: main(): Node no longer exists" when running through the 'foreach' loop.

I have already looked at several Stack Overflow posts about problems with namespaces in Simple XML Element Objects but I wasn't able to adapt the solutions proposed there for the problem I face here.
I hope that somebody can help me with this and point me to a solution, so I can access the information in the 'dc' namespace of the XML.

This is the very short and simple PHP-Code I have used so far:

$request = file_get_contents("http://externalXML.com"); //URL was replaced
echo "<pre>"; print_r($request); echo "</pre>"; 
$xml = simplexml_load_string($request);
echo "<pre>"; print_r($xml); echo "</pre>"; 
foreach ($xml->records->record->recordData->dc->children() as $child) {
    echo "Inhalt: " . $child . "<br>";
}

And this is the content of the XML (as I'm always looking for an unique ISBN (see 'query'-element) there can only be no or one result, but never more):

<searchRetrieveResponse>
<version>1.1</version>
<numberOfRecords>1</numberOfRecords>
<records>
    <record>
    <recordSchema>oai_dc</recordSchema>
    <recordPacking>xml</recordPacking>
    <recordData>
        <dc>
            <dc:title>1968 : Worauf wir stolz sein dürfen / Gretchen Dutschke</dc:title>
            <dc:creator>Dutschke, Gretchen [Verfasser]</dc:creator>
            <dc:publisher>Hamburg : Sven Murmann Verlagsgesellschaft mbH</dc:publisher>
            <dc:date>2018</dc:date>
            <dc:language>ger</dc:language>
            <dc:identifier xsi:type="tel:URN">urn:nbn:de:101:1-201803147211</dc:identifier>
            <dc:identifier xsi:type="tel:URL">http://nbn-resolving.de/urn:nbn:de:101:1-201803147211</dc:identifier>
            <dc:identifier xsi:type="tel:ISBN">978-3-96196-007-1</dc:identifier>
            <dc:identifier xsi:type="tel:URL">http://d-nb.info/1154519600/34</dc:identifier>
            <dc:identifier xsi:type="tel:URL">https://www.kursbuch.online</dc:identifier>
            <dc:identifier xsi:type="dnb:IDN">1154519600</dc:identifier>
            <dc:subject>300 Sozialwissenschaften, Soziologie, Anthropologie</dc:subject>
            <dc:type>Online-Ressource</dc:type>
            <dc:relation>http://d-nb.info/1144647959</dc:relation>
        </dc>
    </recordData>
    <recordPosition>1</recordPosition>
    </record>
</records>
<nextRecordPosition>2</nextRecordPosition>
<echoedSearchRetrieveRequest>
<version>1.1</version>
<query>"9783961960071"</query>
<xQuery xsi:nil="true"/>
</echoedSearchRetrieveRequest>
</searchRetrieveResponse>

Cheers, Timo


Solution

  • Note: If the missing declarations are just a mistake in the question, this should be marked as a duplicate of Reference - how do I handle namespaces (tags and attributes with colon in) in SimpleXML?

    If the XML is actually as shown in the question, it is invalid - there are no declarations for the namespace prefixes dc and xsi. If you check your PHP logs, or turn on display_errors, you will see dozens of warnings every time the XML is parsed.

    To work around this broken XML, you could wrap the response in an extra root element that defines the namespaces, resulting in valid XML.

    // Define your namespace URIs somewhere, for reference
    // Since you're faking them, they could be anything you like, but in case the XML
    //  is fixed in future, you might as well use the values that were probably intended
    define('XMLNS_DUBLIN_CORE', 'http://purl.org/dc/elements/1.1/');
    define('XMLNS_XSD_INSTANCE', 'http://www.w3.org/2001/XMLSchema-instance');
    
    // Add a wrapper with the missing namespace declarations around the whole document
    $request = '<dummy xmlns:dc="' . XMLNS_DUBLIN_CORE . '" xmlns:xsi="' . XMLNS_XSD_INSTANCE . '">'
        . $request
        . "</dummy>";
    
    // Parse the now-valid XML
    $xml = simplexml_load_string($request);
    
    // Pop the wrapper off to get the original root element
    $xml = $xml->children()[0];
    
    // Proceed as though the document had been defined properly
    echo $xml->records->record->recordData->dc->children(XMLNS_DUBLIN_CORE)->title;