phphtmlhtml-tablehtml-tableextract

retrieving data from a html table using php


I am aware that this question has been asked many times but I have looked into many examples and I have still been unable to get the data I need out of this html table.

I have a php file that generates a html table like this:

    <table width="97%">
    <tr><td align="center">
    <!-- table for columns -->
    <table border="0" cellpadding="15">
    <tr>
        <td valign="top">

        <table border="0" width="800">
        <caption style="font-size: 32px; font-weight: bold;">
        </caption>

        <!-- force column widths exactly (for some reason it didn't want to
        play along with normal width settings) -->
        <tr>
        <td><img src="/spacer.gif" width="160" height="1" border="0" alt="" /></td>
        <td><img src="/spacer.gif" width="170" height="1" border="0" alt="" /></td>
        </tr>
            <tr>
                <td style="">
                DATA1
                </td>

                <td width="200" style="font-size: 80px; font-weight:bold;">
                0            </td>
            </tr>

            <tr>
                <td style="">
                DATA2
                </td>

                <td width="200" style="font-size: 80px; font-weight:bold;">
                0            </td>
            </tr>
            <tr>
                <td style="">
                DATA3
                </td>

                <td width="200" style="font-size: 80px; font-weight:bold;">
        0            </td>
            </tr>
            <tr>
                <td style="">
                DATA4
                </td>

                <td width="200" style="font-size: 80px; font-weight:bold;">
                5            </td>
            </tr>
            <tr>
                <td style="">
                DATA5
                </td>

                <td width="200" style="font-size: 80px; font-weight:bold;">
                0            </td>
            </tr>
            <tr>
                <td style="">
                DATA6
                </td>

                <td width="200" style="font-size: 80px; font-weight:bold;">
                0            </td>
            </tr>


        <!-- end of stats_with_style loop -->

        </table>

        </td>



    <!-- end of groups loop -->

    </tr>
    </table>

    <br /><br />


    </td></tr>
    </table>

And I want to get the html (number) of each DATA set (after the style on each ) using php.

Can anyone shed some light on how I can do this?


Solution

  • I would normally suggest using a DOM parser like Ganon, but if this HTML's structure stays fairly simple (like this), just using PHP's native DOM and XPath selectors might just be a simpler, lower-overhead solution. Load your HTML into a string like this:

    <?php
    $html = <<<EOF
    <table width="97%">
        <tr><td align="center">
        <!--SNIP-->
    EOF;
    
    $dom = new DOMDocument();
    $dom->loadHTML($html);
    $xpath = new DOMXPath($dom);
    $data = [];
    
    // targets any <td> with a <style> element and only selects odd elements
    // (XPath counting starts at 1)
    foreach($xpath->query("//td[@style][position() mod 2 = 0]") as $node) {
        //replace superflous whitespace in the string
        $data[] = preg_replace('/\s+/', '', $node->nodeValue);
    }
    

    And you will now have a $data[] array consisting of only the numeric values (which you requested).

    If you need the keys (DATA1 etc...) as well, it's a fairly straight-forward job to make it into an associative array by looping over the even elements, just add this code:

    foreach($xpath->query("//td[@style][position() mod 2 = 1]") as $node) {
        $keys[] = preg_replace('/\s+/', '', $node->nodeValue);
    }
    
    $dataWithKeys = array_combine($keys, $data);
    

    Hope that helps!