javaregexreplaceregex-groupregex-replace

How do I replace a certain char in between 2 strings using regex


I'm new to regex and have been trying to work this out on my own but I don't seem to get it working. I have an input that contains start and end flags and I want to replace a certain char, but only if it's between the flags.

So for example if the start flag is START and the end flag is END and the char i'm trying to replace is " and I would be replacing it with \"

I would say input.replaceAll(regex, '\\\"');

I tried making a regex to only match the correct " chars but so far I have only been able to get it to match all chars between the flags and not just the " chars. -> (?<=START)(.*)(?=END)

Example input:

This " is START an " example input END string ""
START This is a "" second example END
This" is "a START third example END " "

Expected output:

This " is START an \" example input END string ""
START This is a \"\" second example END
This" is "a START third example END " "

Solution

  • Find all characters between START and END, and for those characters replace " with \".

    To achieve this, apply a replacer function to all matches of characters between START and END:

    string = Pattern.compile("(?<=START).*?(?=END)").matcher(string)
        .replaceAll(mr -> mr.group().replace("\"", "\\\\\""));
    

    which produces your expected output.

    Some notes on how this works.

    This first step is to match all characters between START and END, which uses look arounds with a reluctant quantifier:

    (?<=START).*?(?=END)
    

    The ? after the .* changes the match from greedy (as many chars as possible while still matching) to reluctant (as few chars as possible while still matching). This prevents the middle quote in the following input from being altered:

    START a"b END c"d START e"f END
    

    A greedy quantifier will match from the first START all the way past the next END to the last END, incorrectly including c"d.

    The next step is for each match to replace " with \". The full match is group 0, or just MatchResult#group. and we don't need regex for this replacement - just plain string replace is enough (and yes, replace() replaces all occurrences).