javapythonparsingstring-decoding

java method for decoding via unicode_escape


Python has a nice function to decode hex and unicode characters in Strings, like so:

print "123\x20Fake\x20St\u002e".decode('unicode_escape')

Will print:

123 Fake St.

Is there anything similar in java, or is this something that has to be handled with regexes?

EDIT 1

I believe my question is different than this one as it looks like that question is asking to decode a hex only String. Mine is mixed.


Solution

  • If you try to use that string you would receive an error since \x is not a valid scape character. You can use either the unicode form that would be replacing \x by \u00 or the octal form, without any previous character.

    Unicode:

    System.out.println("123\u0020Fake\u0020St\u002e"); // 123 Fake St.
    

    Octal (20 hex to octal is 40):

    System.out.println("123\40Fake\40St\u002e"); // 123 Fake St.
    

    By the other hand, if you have the string scaped.

    String scaped = "123\\x20Fake\\x20St\\u002e";
    

    Which in java is the literal 123\x20Fake\x20St\u002e.

    You can see this answer to transform the scaped unicodes characters, but you have to handle the hex first, and you can do it replacing the \x with \u00 like mentioned before:

    scaped.replaceAll("\\\\x", "\\\\u00")
    

    And then use any of the methods mentioned in the answer above to transform the unicode scaped string. Using apache commons-text would be something like:

    StringEscapeUtils.unescapeJava("123\\x20Fake\\x20St\\u002e"
            .replaceAll("\\\\x", "\\\\u00")) // 123 Fake St.