I have a string which is :
>>> a = " "
>>> a.isspace()
False
>>> a
'\xe2\x80\x83\xe2\x80\x83 \xe2\x80\x83\xe2\x80\x83 \xe2\x80\x83\xe2\x80\x83 \xe2\x80\x83\xe2\x80\x83 \xe2\x80\x83\xe2\x80\x83 \xe2\x80\x83\xe2\x80\x83 \xe2\x80\x83\xe2\x80\x83 \xe2\x80\x83\xe2\x80\x83 \xe2\x80\x83\xe2\x80\x83 \xe2\x80\x83\xe2\x80\x83 \xe2\x80\x83\xe2\x80\x83 '
>>> print a
>>>
As we can see, when I print string a, it is all spaces. However, using isspace() cannot check it is a string full of spaces. How can I detect such kind of string to be a "space string"?
You do not have a string containing only whitespace characters. You have a bytestring containing the UTF-8 encoding of a Unicode string containing only whitespace characters.
Decoding the bytes in UTF-8 produces a Unicode string that reports True
for isspace
:
>>> a.decode('utf-8').isspace()
True
but don't just slap decode('utf-8')
into your code ad-hoc and hope it works.
Keep track of whether you're using Unicode or bytestrings at all times. Generally, work in Unicode, convert bytestring input to Unicode immediately, and only convert Unicode to bytestrings as it leaves your code.