javacharacteremojigrapheme

How to count grapheme clusters or "perceived" emoji characters in Java


I'm looking to count the number of perceived emoji characters in a provided Java string. I'm currently using the emoji4j library, but it doesn't work for grapheme clusters like this one: 👩‍👩‍👦‍👦

Calling EmojiUtil.getLength("👩‍👩‍👦‍👦") returns 4 instead of 1, and similarly calling EmojiUtil.getLength("👻👩‍👩‍👦‍👦") returns 5 instead of 2.

Are there any APIs or methods on String in Java that make it easy to count grapheme clusters?

I've been hunting around but understandably the codePoints() method on a String includes not only the visible emojis, but also the zero width joiners.

I also attempted this using the BreakIterator:

public static int getLength(String emoji) {
    BreakIterator it = BreakIterator.getCharacterInstance();
    it.setText(emoji);
    int emojiCount = 0;
    while (it.next() != BreakIterator.DONE) {
        emojiCount++;
    }
    return emojiCount;
}

But it seems to behave identically to the codePoints() method, returning 8 for something like "👻👩‍👩‍👦‍👦".


Solution

  • I ended up using the ICU library, which worked much better. No changes (aside from import statements) were needed from my original codeblock, as it simply provides a different implementation of BreakIterator.