I've got a bunch of strings like:
"Hello, here's a test colon:. Here's a test semi-colon;"
I would like to replace that with
"Hello, here's a test colon:. Here's a test semi-colon;"
And so on for all printable ASCII values.
At present I'm using boost::regex_search
to match &#(\d+);
, building up a string as I process each match in turn (including appending the substring containing no matches since the last match I found).
Can anyone think of a better way of doing it? I'm open to non-regex methods, but regex seemed a reasonably sensible approach in this case.
Thanks,
Dom
The big advantage of using a regex is to deal with the tricky cases like &
Entity replacement isn't iterative, it's a single step. The regex is also going to be fairly efficient: the two lead characters are fixed, so it will quickly skip anything not starting with &#
. Finally, the regex solution is one without a lot of surprises for future maintainers.
I'd say a regex was the right choice.
Is it the best regex, though? You know you need two digits and if you have 3 digits, the first one will be a 1. Printable ASCII is after all  -~
. For that reason, you could consider ?\d\d;
.
As for replacing the content, I'd use the basic algorithm described for boost::regex::replace :
For each match // Using regex_iterator<>
Print the prefix of the match
Remove the first 2 and last character of the match (&#;)
lexical_cast the result to int, then truncate to char and append.
Print the suffix of the last match.