phptext-extractiondomparser

How to extract Heading tags in PHP from a string?


From a string that contains a lot of HTML, how can I extract all the text from <h1><h2>etc tags into a new variable?

I would like to capture all of the text from these elements and store them in a new variable as comma-delimited values.

Is it possible using preg_match_all()?


Solution

  • If you actually want to use regular expressions, I think that:

    preg_match_all('/<h[0-6]>([^</h[0-6]>*)</h/i', $string, $matches);
    

    should work as long as your header tags are not nested. As others have said, if you're not in control of the HTML, regular expressions are not a great way to do this.