phpregexweb-scraping

How to use PHP to echo entire word with occurance of character sequence?


I have a PHP scraper that scrapes URLs and echos out the material inside a given div. I want to modify that scraper to check the html on the page for the occurence of a string, and then echo out the entire word the string occurs in.

My Current scraper is this:

<?php
$urls = array( 
"http://www.sample1.html",
"http://www.sample2.html",
"http://www.sample3.html",
);
foreach($urls as $url){
$content = file_get_contents($url);
$first_step = explode( '<div class="div1">' , $content );
$second_step = explode("</div>" , $first_step[1] );
echo $second_step[0]."<br>";
};
?>

I want it look more like this, only working:

$first_step = explode( 'eac' , $content );

With the results being:

  1. teacher
  2. preacher
  3. each etc...

Solution

  • You can use the following regex with preg_match instead of explode:

    (\w*eac\w*)
    

    Code:

    preg_match('(\w*eac\w*)', $content , $first_step , PREG_OFFSET_CAPTURE);
    echo $first_step[1];