I am using StringEscapeUtils#escapeJava
to escape strings. By which the character: "é" (LATIN SMALL LETTER E WITH ACUTE) transformed as "\u00E9" and "😅" (SMILING FACE WITH OPEN MOUTH AND COLD SWEAT) transformed as "\uD83D\uDE05". Now if I want to unescape them, they will revert back to their original form. But I want to unescape "\u00E9" to "é" and keep "\uD83D\uDE05" as it is. What should I do so that the emojis are not get escaped but the alphabets are?
It might be easier to "fully unescape" the string, and then re-escape just the emoji. You can do that by detecting the surrogate pairs of characters, using Character.isLowSurrogate
and Character.isHighSurrogate
.
For example:
StringBuilder sb = new StringBuilder(str.length());
for (int i = 0; i < str.length(); ++i) {
char c = str.charAt(i);
if (Character.isHighSurrogate(c) || Character.isLowSurrogate(c)) {
// Append the escaped character.
sb.append("\\u");
sb.append(String.format("%04x", (int) c));
} else {
// Append the character as-is.
sb.append(c);
}
}
String partlyEscaped = sb.toString();