pythoncgdbctypes

Changing Python integer in memory using ctypes module and GDB session


My question is based on this reddit post. The example there shows how to change an integer in memory using cast function from the ctypes module:

>>> import ctypes
>>> ctypes.cast(id(29), ctypes.POINTER(ctypes.c_long))[3] = 100
>>> 29
100

I'm interested in the low level internals here and I've checked this in GDB session by setting a breakpoint on the cast function in CPython:

(gdb) break cast
Function "cast" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (cast) pending.
(gdb) run test.py 
Starting program: /root/.pyenv/versions/3.8.0-debug/bin/python test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x7ffff00e7b40

Breakpoint 1, cast (ptr=0x9e6e40 <small_ints+1088>, src=10382912, ctype=<_ctypes.PyCPointerType at remote 0xa812a0>) at /root/.pyenv/sources/3.8.0-debug/Python-3.8.0/Modules/_ctypes/_ctypes.c:5540
5540        if (0 == cast_check_pointertype(ctype))
(gdb) p *(PyLongObject *) ptr
$38 = {
  ob_base = {
    ob_base = {
      ob_refcnt = 12, 
      ob_type = 0x9b8060 <PyLong_Type>
    }, 
    ob_size = 1
  }, 
  ob_digit = {100}
}
(gdb) p *((long *) ptr + 3)
$39 = 100
(gdb) p ((long *) ptr + 3)
$40 = (long *) 0x9e6e58 <small_ints+1112>
(gdb) p *((char *) ptr + 3 * 8)
$41 = 100 'd'
(gdb) p ((char *) ptr + 3 * 8)
$42 = 0x9e6e58 <small_ints+1112> "d"
(gdb) set *((long *) ptr + 3) = 29
(gdb) p *((long *) ptr + 3)
$46 = 29
(gdb) p *((char *) ptr + 3 * 8)
$47 = 29 '\035'

I would like to know if it's possible to get the memory address using Python in the GDB session because I couldn't access the returned addresses:

(gdb) python print("{:#x}".format(ctypes.addressof(ctypes.c_int(29))))
0x7f1053c947f0
(gdb) python print("{:#x}".format(id(29)))
0x22699d8
(gdb) p *0x7f1053c947f0
Cannot access memory at address 0x7f1053c947f0
(gdb) p *0x22699d8
Cannot access memory at address 0x22699d8

The indexing is also different compeering to Python REPL, I guess this is related to endianness?

(gdb) python print(ctypes.cast(id(29), ctypes.POINTER(ctypes.c_long))[3])
9
(gdb) python print (ctypes.cast(id(29), ctypes.POINTER(ctypes.c_long))[2])
29

Questions:

  1. Why memory addresses from Python in GDB session are not accessible, values are not in the the process memory range (info proc mappings)?
  2. Why the indexing is different comparing to Python REPL?
  3. (bonus question) I would expect that the src parameter in the CPython cast function holds the address of the object but it seems to be ptr instead and after memcpy result->b_ptr points to a different value than &ptr? Is this were the actual casting happens?

Solution

    1. Your Python process is not a real python process, rather, GDB is running a Python REPL for you. Imagine it as another thread inside of GDB. Of course, this is a simplification, you should see the docs
    2. I was unable to reproduce this behaviour:
      (gdb) python
      >import ctypes
      >print(ctypes.cast(id(29), ctypes.POINTER(ctypes.c_long))[3])
      >end
      29
      
      I can't think of any reason this behaviour would happen (least of all endianness, which is the same across your entire system*)
    3. The src parameter appears to be used as the origin type, rather than the origin object. For reference, see ctypes.h and ctypes/__init__.py (_SimpleCData is just CDataObject with some helpers like indexing and repr). And yes, the memcpy is what does the actual casting in this case, although if you are casting between two data types, there is additional work beforehand.

    * Except on ARM, where you can change endianness with an instruction