phpparsingweb-scraping

Parsing specific data items from website


I tried to retrieve the following data variables from the this webpage

I tried in this way, but i can't separate out some data to store in the above data variables so need some help and suggestion from some PHP expert

 $html = file_get_html('http://www.walmart.com/storeLocator/ca_storefinder_results.do?serviceName=&rx_title=com.wm.www.apps.storelocator.page.serviceLink.title.default&rx_dest=%2Findex.gsp&sfrecords=50&sfsearch_single_line_address=K6T');
foreach($html->find('div[class=StoreAddress] div[1]') as $name)
{
echo $name->innertext.'<br>';
}

The html of this website is complex to identify each data item with it's tag because their are no proper id assigned to tags. Can anyone please suggest easy and scalable way to parse above data items from this website.

Thanks


Solution

  • The html isn't really that complex. Php's iterators and dom/regex functions are clumsy for tasks like this but it can be done:

    $dom = new DOMDocument();
    @$dom->loadHTMLFile('http://www.walmart.com/storeLocator/ca_storefinder_details_short.do?rx_dest=/index.gsp&rx_title=com.wm.www.apps.storelocator.page.serviceLink.title.default&edit_object_id=2092&sfsearch_single_line_address=K6T');
    $xpath = new DOMXPath($dom);
    
    foreach($xpath->query('//div[@class="StoreAddress"]') as $div) {
      // title
      echo $xpath->query(".//div[1]", $div)->item(0)->nodeValue . "\n";
      // street
      echo $xpath->query(".//div[2]", $div)->item(0)->nodeValue . "\n";
      // city state and zip
      preg_match('/(.*), ([A-Z]{2}) (\d{5})/', $xpath->query(".//div[3]", $div)->item(0)->nodeValue, $m);
      // city
      echo $m[1] . "\n";
      // state
      echo $m[2] . "\n";
      // zip
      echo $m[3] . "\n";
    }