pythonpython-3.xoperatorsobject-identity

'is' operator behaves differently when comparing strings with spaces


I've started learning Python (python 3.3) and I was trying out the is operator. I tried this:

>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
>>> c = 'isitthespace'
>>> d = 'isitthespace'
>>> c is d
True
>>> e = 'isitthespace?'
>>> f = 'isitthespace?'
>>> e is f
False

It seems like the space and the question mark make the is behave differently. What's going on?

EDIT: I know I should be using ==, I just wanted to know why is behaves like this.


Solution

  • Warning: this answer is about the implementation details of a specific python interpreter. comparing strings with is==bad idea.

    Well, at least for cpython3.4/2.7.3, the answer is "no, it is not the whitespace". Not only the whitespace:

    Examples

    Alphanumeric string literals always share memory:

    >>> x='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
    >>> y='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
    >>> x is y
    True
    

    Non-alphanumeric string literals share memory if and only if they share the enclosing syntactic block:

    (interpreter)

    >>> x='`!@#$%^&*() \][=-. >:"?<a'; y='`!@#$%^&*() \][=-. >:"?<a';
    >>> z='`!@#$%^&*() \][=-. >:"?<a';
    >>> x is y
    True 
    >>> x is z
    False 
    

    (file)

    x='`!@#$%^&*() \][=-. >:"?<a';
    y='`!@#$%^&*() \][=-. >:"?<a';
    z=(lambda : '`!@#$%^&*() \][=-. >:"?<a')()
    print(x is y)
    print(x is z)
    

    Output: True and False

    For simple binary operations, the compiler is doing very simple constant propagation (see peephole.c), but with strings it does so only if the resulting string is shorter than 21 charcters. If this is the case, the rules mentioned earlier are in force:

    >>> 'a'*10+'a'*10 is 'a'*20
    True
    >>> 'a'*21 is 'a'*21
    False
    >>> 'aaaaaaaaaaaaaaaaaaaaa' is 'aaaaaaaa' + 'aaaaaaaaaaaaa'
    False
    >>> t=2; 'a'*t is 'aa'
    False
    >>> 'a'.__add__('a') is 'aa'
    False
    >>> x='a' ; x+='a'; x is 'aa'
    False
    

    Single characters always share memory, of course:

    >>> chr(0x20) is ' '
    True