phpweb-scrapingdom

Get previous element of a different type with PHP Simple Html Dom?


Hoping this is possible with Simple Html Dom, I'm scraping a page that looks like this:

<h5>this is title 1</h5>
<img>
<img>
<img>

<h5>this is title 2</h5>
<img>
<img>

<h5>this is title 3</h5>
<img>
<img>
<img>
<img>

etc...

I'm trying to get it to look something like:

<h5>this is title 1</h5>
<img>
<h5>this is title 1</h5>
<img>
<h5>this is title 1</h5>
<img>


<h5>this is title 2</h5>
<img>
<h5>this is title 2</h5>
<img>

Which means for each IMG I need to find and grab the first previous H5, I think. There's no parent divs or any structure to make it any easier, it's pretty much how I described it.

The code I'm using looks something like this (simplified):

foreach($html->find('img') as $image){

//do stuff to the img

$title = $html->find('h5')->prev_sibling();


echo $title; echo $image;}

Everything I've tried with prev_sibling gets me a "Fatal error: Call to a member function prev_sibling() on a non-object" and I'm wondering if what I'm trying to do is even possible with PHP Simple HTML Dom. I hope so, all the other scrapers I've tried were making me pull my hair out.


Solution

  • Essentially, you want to select all h5 elements, as well as all the img elements. Then, you loop through them, and check their type. If it's an h5 element, you update your $title variable but don't echo anything. If it's an img, you simply echo the $title before the image. No need to go hunting for the h5 now since you've already cached it.

    Here's an example:

    foreach ( $html->find('h5, img') as $el )
    {
        if ( $el->tag == 'h5' )
        {
            $title = $el->plaintext;
            continue;
        }
    
        echo "<h5>$title</h5>";
        echo $el->outertext;
    }