javaapache-commonshtml-escape-charactersunicode-escapes

Escape Unicode Character 'POPCORN' to HTML Entity


I have a string with an emoji in it

I love 🍿

I need to escape that popcorn emoji with it's html entity so I get

I love 🍿

I'm am writing my code in Java and I have been trying different StringEscapeUtils libraries but haven't gotten it to work. Please help me figure out what I can use to escape special characters like Popcorn.

For reference:

Unicode Character Information

Unicode 8.0 (June 2015)


Solution

  • I would use CharSequence::codePoints to get an IntStream of the code points and map them to strings, and then collect them, concatenating to a single string:

    public String escape(final String s) {
        return s.codePoints()
            .mapToObj(codePoint -> codePoint > 127 ?
                "&#x" + Integer.toHexString(codePoint) + ";" :
                 new String(Character.toChars(codePoint)))
        .collect(Collectors.joining());
    }
    

    For the specified input, this produces:

    I love 🍿