javaregexpcreregex-greedyreplaceall

RegEx for matching between any two HTML tags


I have the following content :

<div class="TEST-TEXT">hi</span>
<a href=\"https://en.wikipedia.org/wiki/TEST-TEXT\">first young CEO's TEST-TEXT</a>
<span class="test">hello</span>

I am trying to match the TEST-TEXT string to replace it is value but only when it is a text and not within an attribute value.

I have checked the concepts of look-ahead and look-behind in Regex but the current issue with that is that it needs to use a fixed width for the match here is a link regex-match-all-characters-between-two-html-tags that show case a very similar case but with an exception that there is a span with a class to create a match also checked the link regex-match-attribute-in-a-html-code

here are two regular expressions I am trying with :

  1. \"([^"]*)\"
  2. (?s)(?<=<([^{]*)>)(.+?)(?=</.>)

both are not working for me try using [https://regex101.com/r/ApbUEW/2]

I expect it to match only the string when it is a text current behavior it matches both cases

Edit : I want the text to be dynamic and not specific to TEST-TEXT


Solution

  • A RegEx for that a string between any two HTML tags

    (?![^<>]*>)(TEST\-TEXT)
    

    Here, assuming that you have valid HTML, the negative lookahead is making sure that we are not inside the tag definition where all its attributes are defined. It does that by ensuring that the next angle bracket that appear is not > which would indicate that we are inside the tag definition.

    Note that the following regex would also achieve the same outcome:

    (?=[^<>]*<)(TEST\-TEXT)
    

    Or even

    (TEST\-TEXT)(?=[^<>]*<)