I'm scraping this huge xml file (300k lines ~ 11MB) with Simple Html Dom and having some issues with memory limits. So I added some php.ini comands to override default settings and enable full control of memory. Bad idea.
My code:
include('simple_html_dom.php');
ini_set('memory_limit', '-1');
ini_set('max_execution_time', '-1');
$xml = file_get_contents('HugeFile.xml');
$xml2 = new simple_html_dom();
$xml2->load($xml);
foreach($xml2->find('tag1') as $element) {
$element->innertext = str_replace('text to replace','new text',$element>innertext);
}
$html->save('output'.xml');
}
Now, Is there a way to make this script work smoothly in a reasonable time without any memory issue? This can be done easily with a text editor, but I need to automate it as I have plenty of files to edit.
Found a better way to do it: No need for the DOM here, I just str_replace
stuff inside the string returned by file_get_contents
then put it in another file with file_put_contents
. Simple and neat:
$xml = file_get_contents('HugeFile.xml');
$new = str_replace('text to replace','new text',$xml);
file_put_contents('output.xml');
And preg_replace
may come in handy for complex modifications.