phphtmlweb-scrapinghtml-parsing

PHP get the <h[1-6]></h[1-6]> values from an html text


On my code I have the following regexp:

 preg_match_all('/<title>([^>]*)<\/title>/si', $contents, $match );

That retrieves the <h>..</h> tags from a webpage. But sometimes it may have HTML tags such as <strong>,<b> etc etc therefore it needs some modification. Therefore I tried this one

preg_match_all('/<h[1-6]>(.*)<\/h[1-6]>/si', $contents, $match );

But something is wrong and does not retrieve the content that is in HTML <h> tags. How can I fix it?


Solution

  • preg_match_all('<h\d>', $contents, $matches);
    
    foreach($matches as $match){
    $num[] = substr ( $match  , 1 , 1 );
    }