pythonstring-length

How to count string's length in Python (if the string includes character escaping)?


There is a string x = '1a\u0398\t\u03B43s'

How to count its length only using code? I mean to add before the string r manually is no good

(x = r'1a\u0398\t\u03B43s').

Have tried this solution, but still no good (it counts 9 symbols and should be 18):

x = '1a\\u0398\\t\\u03B43s'
decoded_s = x.encode().decode('unicode_escape')
print(f'Symbols: {len(decoded_s)}'))

returns 9


Solution


Because you can't convert it into raw string - you can force cast bytes representation to string as follows:

You wish to count what's in between single quotes:

>>> x.encode("unicode_escape")
b'1a\\u0398\\t\\u03b43s'

python-ish conversion is not what you're after:

>>> x.encode("unicode_escape").decode("unicode_escape")
'1aΘ\tδ3s'

you can force convert it to string by casting bytes to ascii:

>>> x.encode("unicode_escape").decode('ascii')
'1a\\\\u0398\\\\t\\\\u03B43s'
>>> len(x.encode("unicode_escape").decode('ascii'))
21

Now with \ it's a bit more complicated - your raw query has 2 \\, but since it's not-raw string it will escape all of them, so even though you see 4 \\\\ it counts 3 \\\ with simple len(...) - therefore you want to subtract 1 for each occurrence (count will count correctly 4 \\\\).

>>> y = x.encode("unicode_escape").decode('ascii')
>>> len(y) - y.count("\\\\")
18