I want to get meta tags from url. If there is a data attribute value, it cannot be extracted properly. How do I change the regular expression?
HTML Code
1. <meta property="og:title" content="111">
2. <meta data-one="true" property="og:description" content="222">
3. <meta data-two="true" property="og:image" content="333">
4. <meta data-three="true" data-another="true" property="og:url" content="444">
PHP Code
preg_match_all('~<\s*meta\s*property="(og:[^"]+)"\s*content="([^"]*)~i', $html, $matches);
Result
Array(
[0] => og:title
)
Hope Result
Array(
[0] => og:title,
[1] => og:description,
[2] => og:image,
[3] => og:url
)
The problem is with the second and third \s*
which says to match zero or more spaces. However, in the second case you want to match \b.*\b
, word boundary (end of meta
), then anything, then a new word boundary (start of property
). For the third case, \s.*\b
is needed as "
is not a word boundary, so your fixed regex is:
preg_match_all('~<\s*meta\b.*\bproperty="(og:[^"]+)"\s.*\bcontent="([^"]*)~i', $html, $matches);
See the example here.