phpxmlapachememory-limit

Php memory limit issue while scraping huge file


I'm scraping this huge xml file (300k lines ~ 11MB) with Simple Html Dom and having some issues with memory limits. So I added some php.ini comands to override default settings and enable full control of memory. Bad idea.

My code:

include('simple_html_dom.php');
ini_set('memory_limit', '-1');
ini_set('max_execution_time', '-1');
$xml = file_get_contents('HugeFile.xml'); 
$xml2 = new simple_html_dom();
$xml2->load($xml);

foreach($xml2->find('tag1') as $element) {
        $element->innertext = str_replace('text to replace','new text',$element>innertext);

    }

$html->save('output'.xml');    
}

Now, Is there a way to make this script work smoothly in a reasonable time without any memory issue? This can be done easily with a text editor, but I need to automate it as I have plenty of files to edit.


Solution

  • Found a better way to do it: No need for the DOM here, I just str_replace stuff inside the string returned by file_get_contents then put it in another file with file_put_contents. Simple and neat:

    $xml = file_get_contents('HugeFile.xml'); 
    $new = str_replace('text to replace','new text',$xml);
    file_put_contents('output.xml');    
    

    And preg_replace may come in handy for complex modifications.