I have a String e2 80 99
which is a Hex representation of a UTF-8 character. The string represents
U+2019 ’ e2 80 99 RIGHT SINGLE QUOTATION MARK
I want to convert e2 80 99
to its corresponding Unicode code point which is U+2019
or even '
(single quotation).
How do I do it?
Basically you need to get a String representation of the character encoded with utf-8, then get the first character of the resulting String (or first + second if the resulting character is represented as two surrogates in UTF-16). This is a proof of concept:
public static void main(String[] args) throws Exception {
// Convert your representation of a char into a String object:
String utf8char = "e2 80 99";
String[] strNumbers = utf8char.split(" ");
byte[] rawChars = new byte[strNumbers.length];
int index = 0;
for(String strNumber: strNumbers) {
rawChars[index++] = (byte)(int)Integer.valueOf(strNumber, 16);
}
String utf16Char = new String(rawChars, Charset.forName("UTF-8"));
// get the resulting characters (Java Strings are "encoded" in UTF16)
int codePoint = utf16Char.charAt(0);
if(Character.isSurrogate(utf16Char.charAt(0))) {
codePoint = Character.toCodePoint(utf16Char.charAt(0), utf16Char.charAt(1));
}
System.out.println("code point: " + Integer.toHexString(codePoint));
}