regex

edit regex pattern to match src between single quote


in c# this pattern match img and the src url in the src group

<img.*?src=.*?\"(?<src>([^\"]*?))\".*?>

but in some document the match fails because the src in enclosed in single quote.

example:

<div class="tableauPlaceholder" id="viz1749842670060" style="position: relative"><noscript><a href='#'><img alt='Dashboard 1 ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Pu&#47;PuntualitneipagamentiB2Bdifferenzeperdimensioneesettore&#47;Dashboard1&#47;1_rss.png' style='border: none'></a></noscript><object class="tableauViz" style="display:none;">
</div>

i really don't understand why the src group contains tableauViz, the contents of class attribute.

is there a way to edit the pattern to match correctly src of image tag even when contents is between single quote?


Solution

  • Your current regex only matches double-quoted values ("). When the src is enclosed in single quotes, it fails or captures something wrong.

    Try:

    <img[^>]*?\s+src\s*=\s*['"](?<src>[^'"]+)['"][^>]*?>