phphtmlregexstringpreserve

Regex - how to remove specific html tag preserving the content in it?


I have this html code where I need to remove the span tags that will be always called with the same attributes but I want to preserve the content. How can I make the regex to work for it? I am not too expert with ReGex. <span class="span-class"> My mother has blue eyes. </span>

The regex I got so far does remove the span tags but it also removes the content, What is the issue with the regex? Can someone please help me? Thanks. This is what I tried:

Regex -> <span\s+class="span-class">([\s\S]*?)</span>


Solution

  • Parsing HTML is tricky, but for things that you know will be limited it can work pretty well as long as the possible HTML tag names, attributes, and class names don't get out of hand. Your code actually seems to work OK for me except I made a minor change to escape the slash on the closing </span> tag

    <span\s+class="span-class">([\s\S]*?)<\/span>

    See it in action here https://regexr.com/7hqgr or below in JS

    const patt = /<span\s+class="span-class">([\s\S]*?)<\/span>/ig;
    const str = `<span class="span-class"> My mother has blue eyes. </span>`;
    
    console.log(patt.exec(str));

    Not sure if you need it, but an alternative to allow for any tag name with a matching end tag is to use a back reference in the pattern

    <([a-z0-9]+)\s+class="span-class">([\s\S]*?)<\/\1>

    https://regexr.com/7hqgl

    const patt = /<([a-z0-9]+)\s+class="span-class">([\s\S]*?)<\/\1>/ig;
    
    console.log([...`<span class="span-class"> Example 1 </span>`.matchAll(patt)][0]);
    console.log([...`<div class="span-class"> Example 2 </div>`.matchAll(patt)][0]);
    console.log([...`<h4 class="span-class"> Example 3 </h4>`.matchAll(patt)][0]);
    
    console.log([...`<h4 class="span-class"> No match - no matching tags! </h5>`.matchAll(patt)][0]);