phphtmlweb-scrapingextracthtml-content-extraction

php extract info from a html page


I have this code

<input type=hidden name="code1" value="AA-T5301">
    <tr>
        <td align=left valign=middle class="stdtext">
            AA-T5301
        </a>
        </td>
        <td valign=middle align=left class="stdtext">
            <a onMouseOver="window.status='See the more info on '; return true"
                HREF="product.asp?ms=&dept_id=322&sku=32&nav=">
                Grapeseed Oil 150ml
            </A>
        </td>
        <td valign=middle align=right class="stdtext">Order Now</td>
        <td valign=middle align=right class="stdtext">
            <font class="productsale">
                <strike>£3.04</strike>
                &#160;
            </font>
            £2.04
        </td>
        <td valign=middle align=right class="stdtext">
            <input type=text size=4 name="qty_AA-T5301" value="0">
        </td>
    </tr>
    <input type=hidden name="code2" value="AA-T5302">
        <tr>
            <td align=left valign=middle class="stdtext">
                AA-T5302
            </a>
            </td>
            <td valign=middle align=left class="stdtext">
                <a onMouseOver="window.status='See the more info on '; return true"
                    HREF="product.asp?ms=&dept_id=322&sku=143&nav=">
                    Grapeseed Oil 500ml
                </A>
            </td>
            <td valign=middle align=right class="stdtext">Order Now</td>
            <td valign=middle align=right class="stdtext">
                <font class="productsale">
                    <strike>£6.46</strike>
                    &#160;
                </font>
                £4.33
            </td>
            <td valign=middle align=right class="stdtext">
                <input type=text size=4 name="qty_AA-T5302" value="0">
            </td>
        </tr>
        <input type=hidden name="code3" value="AA-T530">
            <tr>
                <td align=left valign=middle class="stdtext">
                    AA-T530
                </a>
                </td>
                <td valign=middle align=left class="stdtext">
                    <a onMouseOver="window.status='See the more info on '; return true"
                        HREF="product.asp?ms=&dept_id=322&sku=19&nav=">
                        Grapeseed Oil 50ml
                    </A>
                </td>
                <td valign=middle align=right class="stdtext">Out of Stock</td>
                <td valign=middle align=right class="stdtext">
                    <font class="productsale">
                        <strike>£1.75</strike>
                        &#160;
                    </font>
                    £1.17
                </td>
                <td valign=middle align=right class="stdtext">
                    <input type=text size=4 name="qty_AA-T530" value="0">
                </td>
            </tr>

How can i extract the info into an array so i have something like this..

product_code_array=(AA-T5301,AA-T5302,AA-T530);

RRP_array=(3.04,6.46,1.75);

price_array=(2.04,4.33,1.17);

Note: There maybe more than 3 items on a page at a time or there may only be 1


Solution

  • <?php
     $text = '<input type=hidden name="code1" value="AA-T5301">
        <tr>
            <td align=left valign=middle class="stdtext">
                AA-T5301
            </a>
            </td>
            <td valign=middle align=left class="stdtext">
                <a onMouseOver="window.status=\'See the more info on \'; return true"
                    HREF="product.asp?ms=&dept_id=322&sku=32&nav=">
                    Grapeseed Oil 150ml
                </A>
            </td>
            <td valign=middle align=right class="stdtext">Order Now</td>
            <td valign=middle align=right class="stdtext">
                <font class="productsale">
                    <strike>£3.04</strike>
                    &#160;
                </font>
                <span id="now">£2.04</span>
            </td>
            <td valign=middle align=right class="stdtext">
                <input type=text size=4 name="qty_AA-T5301" value="0">
            </td>
        </tr>
        <input type=hidden name="code2" value="AA-T5302">
            <tr>
                <td align=left valign=middle class="stdtext">
                    AA-T5302
                </a>
                </td>
                <td valign=middle align=left class="stdtext">
                    <a onMouseOver="window.status=\'See the more info on \'; return true"
                        HREF="product.asp?ms=&dept_id=322&sku=143&nav=">
                        Grapeseed Oil 500ml
                    </A>
                </td>
                <td valign=middle align=right class="stdtext">Order Now</td>
                <td valign=middle align=right class="stdtext">
                    <font class="productsale">
                        <strike>£6.46</strike>
                        &#160;
                    </font>
                    <span id="now">£4.33</span>
                </td>
                <td valign=middle align=right class="stdtext">
                    <input type=text size=4 name="qty_AA-T5302" value="0">
                </td>
            </tr>
            <input type=hidden name="code3" value="AA-T530">
                <tr>
                    <td align=left valign=middle class="stdtext">
                        AA-T530
                    </a>
                    </td>
                    <td valign=middle align=left class="stdtext">
                        <a onMouseOver="window.status=\'See the more info on \'; return true"
                            HREF="product.asp?ms=&dept_id=322&sku=19&nav=">
                            Grapeseed Oil 50ml
                        </A>
                    </td>
                    <td valign=middle align=right class="stdtext">Out of Stock</td>
                    <td valign=middle align=right class="stdtext">
                        <font class="productsale">
                            <strike>£1.75</strike>
                            &#160;
                        </font>
                        <span id="now">£1.17</span>
                    </td>
                    <td valign=middle align=right class="stdtext">
                        <input type=text size=4 name="qty_AA-T530" value="0">
                    </td>
                </tr>';
    
        $values = array();
        preg_match_all("#\<input type\=hidden name\=\"code[0-9]\" value\=\"(.*)\"\>#isU", $text, $values[0]);
        preg_match_all("#\<strike\>£([0-9\.]+)\<\/strike\>#isU" ,$text, $values[1]);
        preg_match_all("#\<span id\=\"now\"\>£([0-9\.]+)\<\/span\>#isU" ,$text, $values[2]);
    
        $product_code_array = $values[0][1];
        $RRP_array = $values[1][1];
        $price_array = $values[2][1];
    ?>