I have a bit of php that grabs the html from a page and loads it into a simplexml object. However its not getting the classes of the element within a
The php
//load the html page with curl
$html = curl_exec($ch);
curl_close($ch);
$doc = new DOMDocument();
$doc->loadHTML($html);
$sxml = simplexml_import_dom($doc);
The page html. Which if I do a var_dump of $html shows its been scraped and exists in $html
<li class="large">
<a style="" id="ref_3" class="off" href="#" onmouseover="highlightme('07');return false;" onclick="req('379');return false;" title="">07</a>
</li>
The var_dump (below) of $doc and $sxml show that the a class of 'off' is now missing. Unfortunately I need to process the page based on this class.
[8]=>
object(SimpleXMLElement)#50 (2) {
["@attributes"]=>
array(1) {
["class"]=>
string(16) "large"
}
["a"]=>
string(2) "08"
}
Using simplexml_load_file
and xpath
, see the inline comments.
What you are after, really, once you found the element you need is this
$row->a->attributes()->class=="off"
And the full code below:
// let's take all the divs that have the class "stff_grid"
$divs = $xml->xpath("//*[@class='stff_grid']");
// for each of these elements, let's print out the value inside the first p tag
foreach($divs as $div){
print $div->p->a . PHP_EOL;
// now for each li tag let's print out the contents inside the a tag
foreach ($div->ul->li as $row){
// same as before
print " - " . $row->a;
if ($row->a->attributes()->class=="off") print " *off*";
print PHP_EOL;
// or shorter
// print " - " . $row->a . (($row->a->attributes()->class=="off")?" *off*":"") . PHP_EOL;
}
}
/* this outputs the following
Person 1
- 1 hr *off*
- 2 hr
- 3 hr *off*
- 4 hr
- 5 hr
- 6 hr *off*
- 7 hr *off*
- 8 hr
Person 2
- 1 hr
- 2 hr
- 3 hr
- 4 hr
- 5 hr
- 6 hr
- 7 hr *off*
- 8 hr *off*
Person 3
- 1 hr
- 2 hr
- 3 hr
- 4 hr *off*
- 5 hr
- 6 hr
- 7 hr *off*
- 8 hr
*/