phppreg-replacesimple-html-dom

preg_replace in simple html dom


I'm trying to grab the latest news from a website and include it on my own. This site uses Joomla (ugh) and the resulting content hrefs are missing the base href.

so links will hold contensite.php?blablabla which will result in links http://www.example.com/contensite.php?blablabla

So I thought of replacing http:// with http://www.basehref.com before echo-ing it out. but my knowledge stops here.

Which should I use: preg_replace, str_replace? I'm not sure.


Solution

  • include_once('db_connect.php');
    // connect to my db
    require_once('Net/URL2.php');
    include_once('dom.php');
    // include html_simple_dom!
    
    $dom = file_get_html('http://www.targetsite.com');
    // get the html content of a site and pass it through html simple dom !
    
    $elem2 = $dom->find('div[class=blog]', 0);
    // set the div to target for !
    
    
    $uri = new Net_URL2('http://www.svvenray.nl'); // URI of the resource
    $baseURI = $uri;
    foreach ($elem2->find('base[href]') as $elem) {
    $baseURI = $uri->resolve($elem->href);
    }
    
    foreach ($elem2->find('*[src]') as $elem) {
    $elem->src = $baseURI->resolve($elem->src)->__toString();
    }
    foreach ($elem2->find('*[href]') as $elem) {
    if (strtoupper($elem->tag) === 'BASE') continue;
    $elem->href = $baseURI->resolve($elem->href)->__toString();
    }
    
    echo $elem2; 
    

    This will fix all broken links, and requires PHP PEAR Net/URL2.php