rubymacosunicode

Ruby poisonous emoji character?


I have Ruby 3.3.4 installed on MacOS 14.6.1.

Suppose I have this string in the shell:

$ st="0😀2☺️4🤪6🥳8🥸"
$ echo "$st"
0😀2☺️4🤪6🥳8🥸

If I now feed that string into Ruby, I get the second emoji broken into constituent parts:

$ echo "$st" | ruby -lne 'p $_.split("")'
["0", "😀", "2", "☺", "️", "4", "🤪", "6", "🥳", "8", "🥸"]
                  ^    ^   # should be ONE grapheme

Same if I read that string from a file:

$ cat wee_file
0😀2☺️4🤪6🥳8🥸

$ ruby -lne 'p $_.split("")' wee_file 
["0", "😀", "2", "☺", "️", "4", "🤪", "6", "🥳", "8", "🥸"]

Same thing in IRB:

irb(main):001> File.open('/tmp/wee_file').gets.split("")
=> ["0", "😀", "2", "☺", "️", "4", "🤪", "6", "🥳", "8", "🥸", "\n"]

But if I replace ☺️ with another emoji (which is also multibyte) the issue goes away:

$ st2="0😀2🐱4🤪6🥳8🥸"
$ echo "$st2" | ruby -lne 'p $_.split("")'
["0", "😀", "2", "🐱", "4", "🤪", "6", "🥳", "8", "🥸"]

# also from a file and also in IRB..

Any idea why the emoji ☺️ is producing this result?


Solution

  • It's because ☺️ is composed of two characters:

    1. U+263A (White Smiling Face)
    2. ◌️ U+FE0F (Variation Selector-16)

    The latter is used to to request an emoji presentation for the preceding character.

    "☺️".codepoints.map { |c| c.to_s(16) }
    #=> ["263a", "fe0f"]
    

    You can get the expected result via grapheme_clusters or enumerate them via each_grapheme_cluster:

    "0😀2☺️4🤪6🥳8🥸".grapheme_clusters
    #=> ["0", "😀", "2", "☺️", "4", "🤪", "6", "🥳", "8", "🥸"]