pythonpython-3.xutf-8emojivader

Convert \u2764\ufe0f to UTF-8 using Python


I receive json data from an API:

json = {"lat": null, "body_text": "@edinburgh \u2764\ufe0f", "deduplicated_time": "2020-11-05T15:38:11.744710"}

I use Python to load the json message.

msg_body = json.loads(msg.body,strict=False)

I use VaderSentiment to extract the sentiment from the text on the body_text section of the json message.

Problem is that when red heart ❀ emoji is included as \u2764\ufe0f on the text Vader fails to predict the correct emotion. On their page they suggest that vader is translating utf-8 encoded emojis such as πŸ’˜ and πŸ’‹ and 😁. I believe that \u2764\ufe0f is not UTF-8 , how can I turn it UTF-8 using python?

If the following page emoji is correct the \u2764\ufe0f is "python src" encoding.


Solution

  • It’s a JSON encoded Unicode character. Decode the JSON, e.g. with json.loads, and you’ll get a Python string with a red heart. If you need to encode that to UTF-8 encoded bytes, use str.encode (though likely the library you want to use it with will want normal Python strs).