I want to show someone how using is
instead of ==
to compare integers can fail. I thought this would work, but it didn't:
>>> import copy
>>> x = 1
>>> y = copy.deepcopy(x)
>>> x is y
True
I can do this easily for bigger integers:
>>> x = 500
>>> y = 500
>>> x is y
False
How can I demonstrate the same thing with smaller integers which might typically be used for enum-like purposes in python?
Small integers ranging from -5 to 256 are supposed to be always cached in CPython's implementation. When you create an int in that range you actually just get back a reference to the existing object (ref). The behavior demonstrated by @casevh's code with a modulo operation involving a large int producing a new small int was really considered a "bug" that was fixed in 3.11 by ensuring that functions that can potentially produce small ints call maybe_small_long
to use cached objects for the returning values when applicable.
With the "bug" now fixed, I suspect that the only viable way to have two small ints of the same value at different addresses with a non-custom CPython 3.11+ build is to modify an existing int object with direct memory access.
As we know an int object is stored with the type PyLongObject
in CPython, which is defined as a type of struct _longobject
:
typedef struct _longobject PyLongObject;
And struct _longobject
is defined as:
struct _longobject {
PyObject_HEAD
_PyLongValue long_value;
};
where PyObject_HEAD
is a macro defined as PyObject ob_base;
, and PyObject
is a type defined as struct _object
, which is defined, after stripping the macros, as:
struct _object {
union {
Py_ssize_t ob_refcnt;
PY_UINT32_T ob_refcnt_split[2];
};
PyTypeObject *ob_type;
};
And _PyLongValue
is a type defined as:
typedef struct _PyLongValue {
uintptr_t lv_tag; /* Number of digits, sign and flags */
digit ob_digit[1];
} _PyLongValue;
where digit
is a type defined as uint32_t
.
Since the value of an integer is stored in the 32-bit ob_digit
, we can calculate its offset to the address of a PyLongObject
, on a 64-bit platform, as 8 bytes for ob_refcnt
, 8 bytes for ob_type
and 8 bytes for lv_tag
for a total of 24 bytes, and use ctypes.c_int32.from_address
to access ob_digit
of a PyLongObject
and change its value.
import ctypes
m = 257
n = int("257")
ctypes.c_int32.from_address(id(n) + 24).value = 1
print(n == 1) # outputs True
print(n is 1) # outputs False
print(m) # other instances of 257 were not mutated
Demo here
Note that negative integers are stored with the lower two bits of lv_tag
set to 2, so if we want to modify a positive integer into a negative one, we can access lv_tag
at the offset of 8 + 8 = 16 bytes from the PyLongObject
to perform the modification:
import ctypes
n = int("257")
ctypes.c_int32.from_address(id(n) + 24).value = 5
lv_tag = ctypes.c_int64.from_address(id(n) + 16)
lv_tag.value = lv_tag.value & ~0 << 2 | 2
print(n == -5) # outputs True
print(n is -5) # outputs False
Demo here
and zero is stored with 0 digits and the lower two bits of lv_tag
set to one so lv_lag
would have an overall value of 1:
import ctypes
n = int("257")
ctypes.c_int32.from_address(id(n) + 24).value = 0
lv_tag = ctypes.c_int64.from_address(id(n) + 16)
lv_tag.value = 1
print(n == 0) # outputs True
print(n is 0) # outputs False
Demo here
Prior to CPython 3.12, however, negative integers are stored with a negative number of digits stored in ob_size
instead of what is now lv_tag
, so to modify a positive integer into a negative one in CPython 3.11 or an older version:
import ctypes
n = int("257")
ctypes.c_int32.from_address(id(n) + 24).value = 5
ob_size = ctypes.c_int64.from_address(id(n) + 16)
ob_size.value = -ob_size.value
print(n == -5) # outputs True
print(n is -5) # outputs False
Demo here
and zero is stored with a zero ob_size
:
import ctypes
n = int("257")
ctypes.c_int32.from_address(id(n) + 24).value = 0
ob_size = ctypes.c_int64.from_address(id(n) + 16)
ob_size.value = 0
print(n == 0) # outputs True
print(n is 0) # outputs False
Demo here