[SOLVED] How to create the int 1 at two different memory locations?

How to create the int 1 at two different memory locations?

I want to show someone how using is instead of == to compare integers can fail. I thought this would work, but it didn't:

>>> import copy
>>> x = 1
>>> y = copy.deepcopy(x)
>>> x is y
True

I can do this easily for bigger integers:

>>> x = 500
>>> y = 500
>>> x is y
False

How can I demonstrate the same thing with smaller integers which might typically be used for enum-like purposes in python?

Solution

Small integers ranging from -5 to 256 are supposed to be always cached in CPython's implementation. When you create an int in that range you actually just get back a reference to the existing object (ref). The behavior demonstrated by @casevh's code with a modulo operation involving a large int producing a new small int was really considered a "bug" that was fixed in 3.11 by ensuring that functions that can potentially produce small ints call maybe_small_long to use cached objects for the returning values when applicable.

With the "bug" now fixed, I suspect that the only viable way to have two small ints of the same value at different addresses with a non-custom CPython 3.11+ build is to modify an existing int object with direct memory access.

As we know an int object is stored with the type PyLongObject in CPython, which is defined as a type of struct _longobject:

typedef struct _longobject PyLongObject;

And struct _longobject is defined as:

struct _longobject {
    PyObject_HEAD
    _PyLongValue long_value;
};

where PyObject_HEAD is a macro defined as PyObject ob_base;, and PyObject is a type defined as struct _object, which is defined, after stripping the macros, as:

struct _object {
    union {
       Py_ssize_t ob_refcnt;
       PY_UINT32_T ob_refcnt_split[2];
    };
    PyTypeObject *ob_type;
};

And _PyLongValue is a type defined as:

typedef struct _PyLongValue {
    uintptr_t lv_tag; /* Number of digits, sign and flags */
    digit ob_digit[1];
} _PyLongValue;

where digit is a type defined as uint32_t.

Since the value of an integer is stored in the 32-bit ob_digit, we can calculate its offset to the address of a PyLongObject, on a 64-bit platform, as 8 bytes for ob_refcnt, 8 bytes for ob_type and 8 bytes for lv_tag for a total of 24 bytes, and use ctypes.c_int32.from_address to access ob_digit of a PyLongObject and change its value.

import ctypes

m = 257
n = int("257")
ctypes.c_int32.from_address(id(n) + 24).value = 1
print(n == 1) # outputs True
print(n is 1) # outputs False
print(m)  # other instances of 257 were not mutated

Demo here

Note that negative integers are stored with the lower two bits of lv_tag set to 2, so if we want to modify a positive integer into a negative one, we can access lv_tag at the offset of 8 + 8 = 16 bytes from the PyLongObject to perform the modification:

import ctypes

n = int("257")
ctypes.c_int32.from_address(id(n) + 24).value = 5
lv_tag = ctypes.c_int64.from_address(id(n) + 16)
lv_tag.value = lv_tag.value & ~0 << 2 | 2
print(n == -5) # outputs True
print(n is -5) # outputs False

Demo here

and zero is stored with 0 digits and the lower two bits of lv_tag set to one so lv_lag would have an overall value of 1:

import ctypes

n = int("257")
ctypes.c_int32.from_address(id(n) + 24).value = 0
lv_tag = ctypes.c_int64.from_address(id(n) + 16)
lv_tag.value = 1
print(n == 0) # outputs True
print(n is 0) # outputs False

Demo here

Prior to CPython 3.12, however, negative integers are stored with a negative number of digits stored in ob_size instead of what is now lv_tag, so to modify a positive integer into a negative one in CPython 3.11 or an older version:

import ctypes

n = int("257")
ctypes.c_int32.from_address(id(n) + 24).value = 5
ob_size = ctypes.c_int64.from_address(id(n) + 16)
ob_size.value = -ob_size.value
print(n == -5) # outputs True
print(n is -5) # outputs False

Demo here

and zero is stored with a zero ob_size:

import ctypes

n = int("257")
ctypes.c_int32.from_address(id(n) + 24).value = 0
ob_size = ctypes.c_int64.from_address(id(n) + 16)
ob_size.value = 0
print(n == 0) # outputs True
print(n is 0) # outputs False

Demo here