I'm sure this is simple but I'm struggling to get it right. I have the following markup:
<div id="container">
<h3>Instructions</h3>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
<h3>Directions</h3>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
<h3>Warnings</h3>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
</div>
Any of the three elements might be missing and they can be in any order, I want to be able to extract the text in the p tags using goutte and know which one I'm dealing with.
I've tried variations of the following without success:
$node->filter('div#container h3')->each(function (Crawler $node) {
switch ($node->text() {
case 'Instructions':
//$instructions = $node->filter('p')->text();
//$instructions = $node->closest('p')->text();
$instructions = $node->parents()->filter('p')->text()
break;
//etc....
}
});
I've also tried using xpath to get preceding-siblings but can't get it right trying things along the lines of
$node->filterXPath("/div[preceding-sibling::h3[normalize-space() = 'Instructions']]");
It doesn't seem like Crawler has a way of traversing to the next immediate sibling of an element so you may need to use XPath. Use the following-sibling::
axis with a [position() = 1]
predicate to limit it to just the very next p that comes after the h3 you want:
$node->filterXPath("/div/h3[normalize-space() = 'Instructions']/following-sibling::p[position() = 1]");