pythonctypescpython

How to create the int 1 at two different memory locations?


I want to show someone how using is instead of == to compare integers can fail. I thought this would work, but it didn't:

>>> import copy
>>> x = 1
>>> y = copy.deepcopy(x)
>>> x is y
True

I can do this easily for bigger integers:

>>> x = 500
>>> y = 500
>>> x is y
False

How can I demonstrate the same thing with smaller integers which might typically be used for enum-like purposes in python?


Solution

  • Small integers ranging from -5 to 256 are supposed to be always cached in CPython's implementation. When you create an int in that range you actually just get back a reference to the existing object (ref). The behavior demonstrated by @casevh's code with a modulo operation involving a large int producing a new small int was really considered a "bug" that was fixed in 3.11 by ensuring that functions that can potentially produce small ints call maybe_small_long to use cached objects for the returning values when applicable.

    With the "bug" now fixed, I suspect that the only viable way to have two small ints of the same value at different addresses with a non-custom CPython 3.11+ build is to modify an existing int object with direct memory access.

    As we know an int object is stored with the type PyLongObject in CPython, which is defined as a type of struct _longobject:

    typedef struct _longobject PyLongObject;
    

    And struct _longobject is defined as:

    struct _longobject {
        PyObject_HEAD
        _PyLongValue long_value;
    };
    

    where PyObject_HEAD is a macro defined as PyObject ob_base;, and PyObject is a type defined as struct _object, which is defined, after stripping the macros, as:

    struct _object {
        union {
           Py_ssize_t ob_refcnt;
           PY_UINT32_T ob_refcnt_split[2];
        };
        PyTypeObject *ob_type;
    };
    

    And _PyLongValue is a type defined as:

    typedef struct _PyLongValue {
        uintptr_t lv_tag; /* Number of digits, sign and flags */
        digit ob_digit[1];
    } _PyLongValue;
    

    where digit is a type defined as uint32_t.

    Since the value of an integer is stored in the 32-bit ob_digit, we can calculate its offset to the address of a PyLongObject, on a 64-bit platform, as 8 bytes for ob_refcnt, 8 bytes for ob_type and 8 bytes for lv_tag for a total of 24 bytes, and use ctypes.c_int32.from_address to access ob_digit of a PyLongObject and change its value.

    import ctypes
    
    m = 257
    n = int("257")
    ctypes.c_int32.from_address(id(n) + 24).value = 1
    print(n == 1) # outputs True
    print(n is 1) # outputs False
    print(m)  # other instances of 257 were not mutated
    

    Demo here

    Note that negative integers are stored with the lower two bits of lv_tag set to 2, so if we want to modify a positive integer into a negative one, we can access lv_tag at the offset of 8 + 8 = 16 bytes from the PyLongObject to perform the modification:

    import ctypes
    
    n = int("257")
    ctypes.c_int32.from_address(id(n) + 24).value = 5
    lv_tag = ctypes.c_int64.from_address(id(n) + 16)
    lv_tag.value = lv_tag.value & ~0 << 2 | 2
    print(n == -5) # outputs True
    print(n is -5) # outputs False
    

    Demo here

    and zero is stored with 0 digits and the lower two bits of lv_tag set to one so lv_lag would have an overall value of 1:

    import ctypes
    
    n = int("257")
    ctypes.c_int32.from_address(id(n) + 24).value = 0
    lv_tag = ctypes.c_int64.from_address(id(n) + 16)
    lv_tag.value = 1
    print(n == 0) # outputs True
    print(n is 0) # outputs False
    

    Demo here

    Prior to CPython 3.12, however, negative integers are stored with a negative number of digits stored in ob_size instead of what is now lv_tag, so to modify a positive integer into a negative one in CPython 3.11 or an older version:

    import ctypes
    
    n = int("257")
    ctypes.c_int32.from_address(id(n) + 24).value = 5
    ob_size = ctypes.c_int64.from_address(id(n) + 16)
    ob_size.value = -ob_size.value
    print(n == -5) # outputs True
    print(n is -5) # outputs False
    

    Demo here

    and zero is stored with a zero ob_size:

    import ctypes
    
    n = int("257")
    ctypes.c_int32.from_address(id(n) + 24).value = 0
    ob_size = ctypes.c_int64.from_address(id(n) + 16)
    ob_size.value = 0
    print(n == 0) # outputs True
    print(n is 0) # outputs False
    

    Demo here