pythonstring-length

How to count string's length in Python (if the string includes character escaping)?


There is a string x = '1a\u0398\t\u03B43s'

How to count its length only using code? I mean to add before the string r manually is no good

(x = r'1a\u0398\t\u03B43s').

Have tried this solution, but still no good (it counts 9 symbols and should be 18):

x = '1a\\u0398\\t\\u03B43s'
decoded_s = x.encode().decode('unicode_escape')
print(f'Symbols: {len(decoded_s)}'))

returns 9


Solution

  • Because you can't convert it into raw string - you can force cast bytes representation to string as follows:

    You wish to count what's in between single quotes:

    >>> x.encode("unicode_escape")
    b'1a\\u0398\\t\\u03b43s'
    

    python-ish conversion is not what you're after:

    >>> x.encode("unicode_escape").decode("unicode_escape")
    '1aΘ\tδ3s'
    

    you can force convert it to string by casting bytes to ascii:

    >>> x.encode("unicode_escape").decode('ascii')
    '1a\\\\u0398\\\\t\\\\u03B43s'
    >>> len(x.encode("unicode_escape").decode('ascii'))
    21
    

    Now with \ it's a bit more complicated - your raw query has 2 \\, but since it's not-raw string it will escape all of them, so even though you see 4 \\\\ it counts 3 \\\ with simple len(...) - therefore you want to subtract 1 for each occurrence (count will count correctly 4 \\\\).

    >>> y = x.encode("unicode_escape").decode('ascii')
    >>> len(y) - y.count("\\\\")
    18