I have just found this strange behaviour parsing data from IANA.
"ǃ".isalpha() # returns True
"!".isalpha() # returns False
Apparently, the two exclamation marks are different:
In [62]: hex(ord("ǃ"))
Out[62]: '0x1c3'
In [63]: hex(ord("!"))
Out[63]: '0x21'
Just wondering is there a way to prevent this to happen? What is the origin of this behaviour?
Check characters in Unicode Database. The exclamation-like ǃ
(\u1c3
) is a letter:
import unicodedata
for c in "!ǃ":
print(c,'{:04x}'.format(ord(c)),unicodedata.category(c), unicodedata.name(c))
! 0021 Po EXCLAMATION MARK ǃ 01c3 Lo LATIN LETTER RETROFLEX CLICK