javautf-8replacelatin1

API or Method to Replace all non-latin-1 characters


I'm dealing with a 3rd party API / Web Service and they only allow latin-1 character set in their XML. Is there an existing API / method that will find and replace all non-latin-1 characters in a String?

For example: Kévin

Is there anyway to make that Kevin?


Solution

  • Using ICU4J,

    public String removeAccents(String text) {
        return Normalizer.decompose(text, false, 0)
                     .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
    }
    

    I found this example at http://glaforge.appspot.com/article/how-to-remove-accents-from-a-string

    In java 1.6 the necessary normalizer might be built-in.