I want to take a string of emoji and do something with the individual characters.
In JavaScript "๐ด๐๐โ๐ ๐๐".length == 13
because "โ"
length is 1, the rest are 2. So we can't do
const string = "๐จโ๐จโ๐งโ๐ง ๐ฆ๐พ ๐ด ๐ ๐ โ ๐ ๐ ๐";
const s = string.split("");
console.log(s);
const a = Array.from(string);
console.log(a);
The Grapheme Splitter library by Orlin Georgiev is pretty amazing.
Although it hasn't been updated in a while and presently (Sep 2020) it only supports Unicode 10 and below.
For an updated version of Grapheme Splitter built in Typescript with Unicode 13 support have a look at: https://github.com/flmnt/graphemer
Here is a quick example:
import Graphemer from 'graphemer';
const splitter = new Graphemer();
const string = "๐ด๐๐โ๐ ๐๐";
splitter.countGraphemes(string); // returns 7
splitter.splitGraphemes(string); // returns array of characters
The library also works with the latest emojis.
For example "๐ฉ๐ปโ๐ฆฐ".length === 7
but splitter.countGraphemes("๐ฉ๐ปโ๐ฆฐ") === 1
.
Full disclosure: I created the library and did the work to update to Unicode 13. The API is identical to Grapheme Splitter and is entirely based on that work, just updated to the latest version of Unicode as the original library hasn't been updated for a couple of years and seems to be no longer maintained.