pythonformatting

Why is there "TypeError: string indices must be integers" when using negative indices or slices in string formatting?


I would like to understand why this works fine:

>>> test_string = 'long brown fox jump over a lazy python'
>>> 'formatted "{test_string[0]}"'.format(test_string=test_string)
'formatted "l"'

Yet this fails:

>>> 'formatted "{test_string[-1]}"'.format(test_string=test_string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: string indices must be integers
>>> 'formatted "{test_string[11:14]}"'.format(test_string=test_string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: string indices must be integers

I know this could be used:

'formatted "{test_string}"'.format(test_string=test_string[11:14])

...but that is not possible in my situation.

I am dealing with a sandbox-like environment where a list of variables is passed to str.format() as dictionary of kwargs. These variables are outside of my control. I know the names and types of variables in advance and can only pass formatter string. The formatter string is my only input. It all works fine when I need to combine a few strings or manipulate numbers and their precision. But it all falls apart when I need to extract a substring.


Solution

  • Why it doesn't work

    This is explained in the spec of str.format():

    The arg_name can be followed by any number of index or attribute expressions. An expression of the form '.name' selects the named attribute using getattr(), while an expression of the form '[index]' does an index lookup using __getitem__().

    That is, you can index the string using bracket notation, and the index you put inside the brackets will be the argument of the __getitem__() method of the string. This is indexing, not slicing. The bottom line is that str.format() simply doesn't support slicing of the replacement field (= the part between {}), as this functionality isn't part of spec.

    Regarding negative indices, the grammar specifies:

    element_index     ::=  digit+ | index_string
    

    This means that the index can either be a sequence of digits (digit+) or a string. Since any negative index such as -1 is not a sequence of digits, it will be parsed as index_string. However, str.__getitem__() only supports arguments of type integer. Hence the error TypeError: string indices must be integers, not 'str'.

    Solutions to the problem

    Use f-strings

    >>> test_string = 'long brown fox jump over a lazy python'
    >>> f"formatted {test_string[0]}"
    'formatted l'
    >>> f"formatted {test_string[0:2]}"
    'formatted lo'
    >>> f"formatted {test_string[-1]}"
    'formatted n'
    

    Use str.format() but slice the argument of str.format() directly, rather than the replacement field

    >>> test_string = 'long brown fox jump over a lazy python'
    >>> 'formatted {replacement}'.format(replacement=test_string[0:2])
    'formatted lo'
    >>> 'formatted {replacement}'.format(replacement=test_string[-1])
    'formatted n'