c++regexboost-regex

replacing content of capture groups in a c++ regex


I'm using C++20 (Visual Studio 2022) with std::regex (although I recently had to switch to boost::regex to get a non-recursive implementation due to issues like this).

I have a long regex with a few capture groups. Here's a greatly simplified example:

    string s("123 hello 456 a,b,c 789 1:2:3");
    regex r(R"regex(\d+ (\w+) \d+ ([\w,]+) \d+ ([\w:]+))regex");
    smatch m;
    regex_match(s, m, r);
    // at this point m[1] == "hello", m[2] == "a,b,c", m[3] == "1:2:3"

I would like to modify and replace each capture group, e.g. let's say I wanted to reverse each one to generate this string:

"123 olleh 456 c,b,a 789 3:2:1"

What's the recommended way to do this? Ideally I'd like to do it in such a way that both std::regex and boost::regex support it. If it greatly simplifies things then a boost-specific answer would help me in this particular case.

Note that there are lots of examples out there of matching a single regex pattern multiple times, and each time you can modify the matching text and replace it with modified text. My case is different, as I have a single long regex with multiple capture groups within it. I'd like to apply different logic to each capture group to compute its replacement.

Also note that the logic isn't a function of each capture group independently. So I'd like to do something like this (continuing the above code):

    // at this point m[1] == "hello", m[2] == "a,b,c", m[3] == "1:2:3"
    string s1 = m[1], s2 = m[2], s3 = m[3];
    compute_replacements(s1, s2, s3); // s1,s2,s3 are modified by this function

    string result = this_question_is_about_what_to_put_here(s, r, s1, s2, s3);
    // result == original string with each capture group replaced by s1, s2, s3 respectively 

Solution

  • I used the positions and lengths given for each match to perform string replacements like this:

    string replace_capture_groups(string s, const smatch& m, const vector<string> r) {
        
        // Make sure we have at least one replacement   
        if (!r.size()) return s;
    
        // Process matches from right to left so that offsets aren't invalidated
        for (int i = r.size(); i > 0; --i) {
            s.replace(m.position(i), m.length(i), r[i - 1]);
        }
        return s;
    }
    

    This approach worked best for me because it doesn't require me to add extra capture groups to my original regex just to capture the entire string.