I am aware that this question has been asked many times but I have looked into many examples and I have still been unable to get the data I need out of this html table.
I have a php file that generates a html table like this:
<table width="97%">
<tr><td align="center">
<!-- table for columns -->
<table border="0" cellpadding="15">
<tr>
<td valign="top">
<table border="0" width="800">
<caption style="font-size: 32px; font-weight: bold;">
</caption>
<!-- force column widths exactly (for some reason it didn't want to
play along with normal width settings) -->
<tr>
<td><img src="/spacer.gif" width="160" height="1" border="0" alt="" /></td>
<td><img src="/spacer.gif" width="170" height="1" border="0" alt="" /></td>
</tr>
<tr>
<td style="">
DATA1
</td>
<td width="200" style="font-size: 80px; font-weight:bold;">
0 </td>
</tr>
<tr>
<td style="">
DATA2
</td>
<td width="200" style="font-size: 80px; font-weight:bold;">
0 </td>
</tr>
<tr>
<td style="">
DATA3
</td>
<td width="200" style="font-size: 80px; font-weight:bold;">
0 </td>
</tr>
<tr>
<td style="">
DATA4
</td>
<td width="200" style="font-size: 80px; font-weight:bold;">
5 </td>
</tr>
<tr>
<td style="">
DATA5
</td>
<td width="200" style="font-size: 80px; font-weight:bold;">
0 </td>
</tr>
<tr>
<td style="">
DATA6
</td>
<td width="200" style="font-size: 80px; font-weight:bold;">
0 </td>
</tr>
<!-- end of stats_with_style loop -->
</table>
</td>
<!-- end of groups loop -->
</tr>
</table>
<br /><br />
</td></tr>
</table>
And I want to get the html (number) of each DATA set (after the style on each ) using php.
Can anyone shed some light on how I can do this?
I would normally suggest using a DOM parser like Ganon, but if this HTML's structure stays fairly simple (like this), just using PHP's native DOM and XPath selectors might just be a simpler, lower-overhead solution. Load your HTML into a string like this:
<?php
$html = <<<EOF
<table width="97%">
<tr><td align="center">
<!--SNIP-->
EOF;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$data = [];
// targets any <td> with a <style> element and only selects odd elements
// (XPath counting starts at 1)
foreach($xpath->query("//td[@style][position() mod 2 = 0]") as $node) {
//replace superflous whitespace in the string
$data[] = preg_replace('/\s+/', '', $node->nodeValue);
}
And you will now have a $data[] array consisting of only the numeric values (which you requested).
If you need the keys (DATA1 etc...) as well, it's a fairly straight-forward job to make it into an associative array by looping over the even elements, just add this code:
foreach($xpath->query("//td[@style][position() mod 2 = 1]") as $node) {
$keys[] = preg_replace('/\s+/', '', $node->nodeValue);
}
$dataWithKeys = array_combine($keys, $data);
Hope that helps!