rubycharacter-encoding

Translate Unicode to ASCII in Ruby


I have an Unicode string and need to translate it into pure ASCII.

t = "\xf0\x9d\x97\x94\xf0\x9d\x98\x82\xf0\x9d\x97\xb4\xf0\x9d\x98\x82\xf0\x9d\x98\x80\xf0\x9d\x98\x81"

My first try was unsuccessful:

t.encode('ASCII', invalid: :replace, undef: :replace, replace: '')
=> ""

Translated the string using unicode normalization:

t.unicode_normalize :nfkd
=> "August"

Is there a better solution? It should be gem-independent and work with Ruby 2.x (String#unicode_normalize is unavailable on 2.1 and earlier versions).


Solution

  • You could translate the Unicode characters to their ASCII equivalents via tr:

    t.tr("𝗔-𝗭𝗮-𝘇", 'A-Za-z')
    #=> "August"
    

    or, using their codepoints:

    t.tr("\u{1D5D4}-\u{1D5ED}\u{1D5EE}-\u{1D607}", "A-Za-z")
    #=> "August"
    

    Make sure that t is UTF-8 encoded.

    Also note that there are other stylizes forms in the Mathematical Alphanumeric Symbols block which you might want to translate accordingly.