javautf-8iso-8859-1

Change UTF-8 character to Latin1 Java


In my project I read Strings from my database, where I can not do any change because of permissions. I take a string in any codification, and I change it to UTF-8 without any problem for instance:

String countryName= "ESPAÑA";   //get from de DataBase in unkwon encode
String utf8 = new String(myString.getBytes(), Charset.forName("UTF-8"));
System.out.println(utf8); //prints -> ESPAÑA and it should be ESPAÑA

I need to change it, with all the strings that are parse to UTF-8, and covert it to Latin1

I have found many methods in the page but anything it is doing correctly.


Solution

  • If you don't know the encoding of the original bytes, you can't transcode them to a known form. I wrote a paper for the Unicode consortium on this problem. see Mapping Text in Unspecified Character Sets to Unicode as a Canonical Representation in a Hostile Environment

    This code new String(myString.getBytes(), Charset.forName("UTF-8") means, I have the bytes in UTF-8, convert them into a Java String.

    UTF-8 can support the full range of Unicode characters (about 2^21 at the moment). Latin 1 can only support 2^8 characters.

    So, transcoding from UTF-8 to Latin-1 is dangerous, as some characters will be lost, and you will need lost character exception handling.

    Transcoding from Latin-1 to UTF-8 is fine, as all characters in Latin-1 are supported in UTF-8.