Never played before with umlauts or specialchars in javascript strings. My problem is how to remove them?
For example I have this in javascript:
var oldstr = "Bayern München";
var str = oldstr.split(' ').join('-');
Result is Bayern-München ok easy, but now I want to remove the umlaut or specialchar like:
Real Sporting de Gijón.
How can I realize this?
Kind regards,
Frank
replace
should be able to do it for you, e.g.:
var str = str.replace(/ü/g, 'u');
...of course ü
and u
are not the same letter. :-)
If you're trying to replace all characters outside a given range with something (like a -
), you can do that by specifying a range:
var str = str.replace(/[^A-Za-z0-9\-_]/g, '-');
That replaces all characters that aren't English letters, digits, -
, or _
with -
. (The character range is the [...]
bit, the ^
at the beginning means "not".) Here's a live example.
But that ("Bayern-M-nchen") may be a bit unpleasant for Mr. München to look at. :-) You could use a function passed into replace
to try to just drop diacriticals:
var str = str.replace(/[^A-Za-z0-9\-_]/g, function(ch) {
// Character that look a bit like 'a'
if ("áàâä".indexOf(ch) >= 0) { // There are a lot more than this
return 'a';
}
// Character that look a bit like 'u'
if ("úùûü".indexOf(ch) >= 0) { // There are a lot more than this
return 'u';
}
/* ...long list of others...*/
// Default
return '-';
});
The above is optimized for long strings. If the string itself is short, you may be better off with repeated regexps:
var str = str.replace(/[áàâä]/g, 'a')
.replace(/[úùûü]/g, 'u')
.replace(/[^A-Za-z0-9\-_]/g, '-');
...but that's speculative.
Note that literal characters in JavaScript strings are totally fine, but you can run into fun with encoding of files. I tend to stick to unicode escapes. So for instance, the above would be:
var str = str.replace(/[\u00e4\u00e2\u00e0\u00e1]/g, 'a')
.replace(/[\u00fc\u00fb\u00f9\u00fa]/g, 'u')
.replace(' ','-');
...but again, there are a lot more to do...