Possible Duplicate:
ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ --> n or Remove diacritical marks from unicode chars
How to replace special characters in a string?
I would like to format some String such as "I>Télé"
to something like "itele"
.
The idea is that I want my String to be lower case (done), without whitespaces (done), no accents or special characters (like >
, <
, /
, %
, ~
, é
, @
, ï
etc).
It is okay to delete occurences of special characters, but I want to keep letters while removing accents (as I did in my example). Here is what I did, but I don't think that the good solution is to replace every é,è,ê,ë by "e", than do it again for "i","a" etc, and then remove every special character...
String name ="I>télé" //example
String result = name.toLowerCase().replace(" ", "").replace("é","e").........;
The purpose of that is to provide a valid filename for resources for an Android app, so if you have any other idea, I'll take it !
You can use the java.text.Normalizer
class to convert your text into normal Latin characters followed by diacritic marks (accents), where possible. So for example, the single-character string "é"
would become the two character string ['e', {COMBINING ACUTE ACCENT}]
.
After you've done this, your String would be a combination of unaccented characters, accent modifiers, and the other special characters you've mentioned. At this point you could filter the characters in your string using only a whitelist to keep what you want (which could be as simple as [A-Za-z0-9]
for a regex, depending on what you're after).
An approach might look like:
String name ="I>télé"; //example
String normalized = Normalizer.normalize(name, Form.NFD);
String result = normalized.replaceAll("[^A-Za-z0-9]", "");