c++regexboostasciincr

Regex Replacing : to ":" etc


I've got a bunch of strings like:

"Hello, here's a test colon:. Here's a test semi-colon&#59;"

I would like to replace that with

"Hello, here's a test colon:. Here's a test semi-colon;"

And so on for all printable ASCII values.

At present I'm using boost::regex_search to match &#(\d+);, building up a string as I process each match in turn (including appending the substring containing no matches since the last match I found).

Can anyone think of a better way of doing it? I'm open to non-regex methods, but regex seemed a reasonably sensible approach in this case.

Thanks,

Dom


Solution

  • The big advantage of using a regex is to deal with the tricky cases like & Entity replacement isn't iterative, it's a single step. The regex is also going to be fairly efficient: the two lead characters are fixed, so it will quickly skip anything not starting with &#. Finally, the regex solution is one without a lot of surprises for future maintainers.

    I'd say a regex was the right choice.

    Is it the best regex, though? You know you need two digits and if you have 3 digits, the first one will be a 1. Printable ASCII is after all  -~. For that reason, you could consider &#1?\d\d;.

    As for replacing the content, I'd use the basic algorithm described for boost::regex::replace :

    For each match // Using regex_iterator<>
        Print the prefix of the match
        Remove the first 2 and last character of the match (&#;)
        lexical_cast the result to int, then truncate to char and append.
    
    Print the suffix of the last match.