phpxmlparsingsimplexmlxmlreader

How to use XMLReader in PHP?


I have the following XML file, the file is rather large and i haven't been able to get simplexml to open and read the file so i'm trying XMLReader with no success in php

<?xml version="1.0" encoding="ISO-8859-1"?>
<products>
    <last_updated>2009-11-30 13:52:40</last_updated>
    <product>
        <element_1>foo</element_1>
        <element_2>foo</element_2>
        <element_3>foo</element_3>
        <element_4>foo</element_4>
    </product>
    <product>
        <element_1>bar</element_1>
        <element_2>bar</element_2>
        <element_3>bar</element_3>
        <element_4>bar</element_4>
    </product>
</products>

I've unfortunately not found a good tutorial on this for PHP and would love to see how I can get each element content to store in a database.


Solution

  • It all depends on how big the unit of work, but I guess you're trying to treat each <product/> nodes in succession.

    For that, the simplest way would be to use XMLReader to get to each node, then use SimpleXML to access them. This way, you keep the memory usage low because you're treating one node at a time and you still leverage SimpleXML's ease of use. For instance:

    $z = new XMLReader;
    $z->open('data.xml');
    
    $doc = new DOMDocument;
    
    // move to the first <product /> node
    while ($z->read() && $z->name !== 'product');
    
    // now that we're at the right depth, hop to the next <product/> until the end of the tree
    while ($z->name === 'product')
    {
        // either one should work
        //$node = new SimpleXMLElement($z->readOuterXML());
        $node = simplexml_import_dom($doc->importNode($z->expand(), true));
    
        // now you can use $node without going insane about parsing
        var_dump($node->element_1);
    
        // go to next <product />
        $z->next('product');
    }
    

    Quick overview of pros and cons of different approaches:

    XMLReader only

    XMLReader + SimpleXML

    XMLReader + DOM

    My advice: write a prototype with SimpleXML, see if it works for you. If performance is paramount, try DOM. Stay as far away from XMLReader as possible. Remember that the more code you write, the higher the possibility of you introducing bugs or introducing performance regressions.