python-3.xstringstring-interning

x[0], y[0], z[0] has the same memory address in cpython, but why a is not?


(env) λ python
Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec  7 2020, 17:08:21) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> x = ["test t", 2, 3]
>>> y = x
>>> z = x[:]
>>> x, y, z
(['test t', 2, 3], ['test t', 2, 3], ['test t', 2, 3])
>>> id(x[0]), id(y[0]), id(z[0])
(2818792500464, 2818792500464, 2818792500464)
>>> a = "test t"
>>> id(a)
2818792500720

I know a little about the string constant pool and the string interning in python, why the python made the new defined string's memory address that contains special characters such as blank space different.


Solution

  • Strings in Python are immutable, that means that once a string is created, it can't be changed. When you create a string, and if you create same string and assign it to another variable they'll both be pointing to the same string/memory. For example,

    >>> a = 'hi'
    >>> b = 'hi'
    >>> id(a)
    437068484
    >>> id(b)
    437068484
    

    This reuse of string objects is called interning in Python. The same strings have the same ids. But Python is not guaranteed to intern strings. If you create strings that are either not code object constants or contain characters outside of the letters + numbers + underscore range, you'll see the id() value not being reused.

    We change the id of the given string as follows. We assign it to two different identifiers. The ids of these variables when found are different. This is because the given string contains characters other than alphabets, digits, and underscore.

    >>> a = 'test_#@$'
    >>> b = 'test_#@$'
    >>> id(a)
    962262086
    >>> id(b)
    917208009
    

    Now looking your example:

    x[0], y[0], z[0] are the same object, because when you create the x = ["test t", 2, 3] you started to pass the reference by alias to y and z

    Now another example to clarify more: Create a list with the same values, and create an alias for it, they will have the same id.

    >>> a = [1,2,3]
    >>> b = a
    >>> id(a)
    1984438821696
    >>> id(b)
    1984438821696
    

    But when you create two equivalent lists, they are not identical as the previous.

    >>> a = [1,2,3]
    >>> b = [1,2,3]
    >>> id(a)
    1984438821696
    >>> id(b)
    1984438822336