I'm looking to count the number of perceived emoji characters in a provided Java string. I'm currently using the emoji4j library, but it doesn't work for grapheme clusters like this one: 👩👩👦👦
Calling EmojiUtil.getLength("👩👩👦👦")
returns 4
instead of 1
, and similarly calling EmojiUtil.getLength("👻👩👩👦👦")
returns 5
instead of 2
.
Are there any APIs or methods on String
in Java that make it easy to count grapheme clusters?
I've been hunting around but understandably the codePoints()
method on a String
includes not only the visible emojis, but also the zero width joiners.
I also attempted this using the BreakIterator
:
public static int getLength(String emoji) {
BreakIterator it = BreakIterator.getCharacterInstance();
it.setText(emoji);
int emojiCount = 0;
while (it.next() != BreakIterator.DONE) {
emojiCount++;
}
return emojiCount;
}
But it seems to behave identically to the codePoints()
method, returning 8
for something like "👻👩👩👦👦"
.
I ended up using the ICU library, which worked much better. No changes (aside from import statements) were needed from my original codeblock, as it simply provides a different implementation of BreakIterator
.