I want to replace most special characters from a string (in javascript), but allow some special cases, like c++, c# and more. I have experimented with the xregexp library in node.js and I am able to remove all non letters and numbers, I think. I would also like to allow all foreign language letters. This is what I have so far:
var str = "I do programming in c++ and sometimes c#, but + and # should be removed";
regex = XRegExp('[^\\s\\p{N}\\p{L}]+', 'g');
var replaced = XRegExp.replace(str, regex, "");
console.log(replaced);
This outputs
I do programming in c and sometimes c, but and should be removed
I need to create some kind of list with allowed words, like c++ and c#. Desired output is:
I do programming in c++ and sometimes c#, but and should be removed
You can just use alternations inside a capturing group and then restore this text with a backreference in the replacement pattern:
var str = "I do programming in c++ and sometimes c#, but + and # should be removed";
regex = XRegExp('(\\b(?:c[+]{2}|c#)(?!\\w))|[^\\s\\p{N}\\p{L}]+', 'ig');
// ^-- capture group 1 -----^ ^
var replaced = XRegExp.replace(str, regex, "$1");
// ^^
console.log(replaced);
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/2.0.0/xregexp-all-min.js"></script>
Note I added an i
flag to make the pattern case insensitive, \b
in the beginning of the alternations to only match at the word boundary (since c++
and c#
start with a letter (word character), and the lookahead (?!\w)
that makes sure there is no word character after +
and #
(\b
would not work here as these are not word characters).