I need to convert user input like "ππ π§πͺ ππππππ"
into plain "ASCII" text, i.e. "jovy debbie"
.
The input comes in different styles, e.g. "π±πππππ π«ππππ"
or "πΆππππππ’π π½πππππ π»πππππ"
.
Any Help will be appreciated, I already refer other stack overflow question but no luck :(
Those letters are from the Mathematical Alphanumeric Symbols block.
Since they have a fixed offset to their ASCII counterparts, you could use tr
to map them, e.g.:
"ππ π§πͺ ππππππ".tr("π-π«", "a-z")
#=> "jovy debbie"
The same approach can be used for the other styles and to map lower / upper case, e.g.
"π±πππππ π«ππππ".tr("π-ππ¨-π", "a-zA-Z")
#=> "Jenica Dugos"
This gives you full control over the character mapping.
Alternatively, you could try Unicode normalization. The NFKC / NFKD forms should remove most formatting and seem to work for your examples:
"ππ π§πͺ ππππππ".unicode_normalize(:nfkc)
#=> "jovy debbie"
"π±πππππ π«ππππ".unicode_normalize(:nfkc)
#=> "Jenica Dugos"