pythonunicode

When running python script i get – instead of a hyphen


I'm trying to fix a python script, every time there's a title with a hyphen in it shows – and the error is

KeyError: 'text here \xe2\x80\x93 text here'

The script grabs interacts with an API and the API has been set up with the hyphens as start of guarded areas, therefore the hyphens are not actually hyphens, they are start of guarded areas, so i have put these in the code, but when running the script it doesn't quite recognize these. I have already got # -- coding: utf-8 -- at the top of the script.

This isn't the entire script of course but this is where i would amend the "-" to whatever it needs to be to make this work.

-- coding: utf-8 --

team_list = ["text here – text here",
             "text here – text here"] 

This is what is produced when run:

REQUEST @:text here – text here
STATUS: <Response [200]>>

Traceback (most recent call last):
  File "filepath here", line 102, in <module>
    request(url_list[i], team_list[i], team_data[i], team_count[i], team_name[i])
  File "filepath here", line 66, in request
    if rnamedata["data"][team]["incident"]["data"][0] == None:
KeyError: 'text here \xe2\x80\x93 text here'

I would expect it to return with a hyphen symbol and not a – or \xe2\x80\x93


Solution

  • The byte sequence b"\xe2\x80\x93" is a Unicode en-dash, U+2013. The character is '–' which looks almost identical to an ascii hyphen-minus '-' U+002D but isn't. An en-dash is wider.

    You are getting a key error because the key has a hyphen in it, but your data doesn't.

    Putting -- coding: UTF-8 -- at the top of your program has no effect on the way your program reads data. That is an indication to the interpreter of the encoding of your source code. And UTF-8 is the default anyway.