javaapache-commons-lang3

StringEscapeUtils: How to unescape a string except emoji?


I am using StringEscapeUtils#escapeJava to escape strings. By which the character: "é" (LATIN SMALL LETTER E WITH ACUTE) transformed as "\u00E9" and "😅" (SMILING FACE WITH OPEN MOUTH AND COLD SWEAT) transformed as "\uD83D\uDE05". Now if I want to unescape them, they will revert back to their original form. But I want to unescape "\u00E9" to "é" and keep "\uD83D\uDE05" as it is. What should I do so that the emojis are not get escaped but the alphabets are?


Solution

  • It might be easier to "fully unescape" the string, and then re-escape just the emoji. You can do that by detecting the surrogate pairs of characters, using Character.isLowSurrogate and Character.isHighSurrogate.

    For example:

    StringBuilder sb = new StringBuilder(str.length());
    for (int i = 0; i < str.length(); ++i) {
      char c = str.charAt(i);
      if (Character.isHighSurrogate(c) || Character.isLowSurrogate(c)) {
        // Append the escaped character.
        sb.append("\\u");
        sb.append(String.format("%04x", (int) c));
      } else {
        // Append the character as-is.
        sb.append(c);
      }
    }
    String partlyEscaped = sb.toString();
    

    Ideone demo