I'm using python bytes objects to pass some data to natively implemented methods using the CFFI library, for example:
from cffi import FFI
ffi = FFI()
lib = ffi.dlopen(libname)
ffi.cdef("""
void foo(char*);
""")
x = b'abc123'
lib.foo(x)
As far as I understand, the pointer received by the native method is that of the actual underlying buffer behind the x bytes object. This works fine 99% of the time, but sometimes the pointer seems to get invalidated and the data it points to contains garbage, some time after the native call has finished - the native code keeps the pointer around after returning from the initial call and expects the data to be present there, and the python code makes sure to keep a reference to x so that the pointer, hopefully, remains valid.
In these cases, if I call a native method with the same bytes object again, I can see that I get a different pointer pointing to the same value but located at a different address, indicating that the underlying buffer behind the bytes object has moved (if my assumption about CFFI extracting a pointer to the underlying array contained by the bytes object is correct, and a temporary copy is not created anywhere), even though, to the best of my knowledge, the bytes object has not been modified in any way (the code is part of a large codebase, but I'm reasonably sure that the bytes objects do not get modified directly by code).
What could be happening here? Is my assumption about CFFI getting a pointer to the actual internal buffer of the bytes object incorrect? Is python maybe allowed to silently reallocate the buffers behind bytes objects for garbage collection / memory compaction reasons, and does that unaware of me holding a pointer to it? I'm using pypy instead of the default python interpreter, if that makes a difference.
Your guess is the correct answer. The (documented) guarantee is only that the pointer passed in this case is valid for the duration of the call.
PyPy's garbage collector can move objects in memory, if they are small enough that doing so is a win in overall performance. When doing such a cffi call, though, pypy will generally mark the object as "pinned" for the duration of the call (unless there are already too many pinned objects and adding more would seriously hurt future GC performance; in this rare case it will make a copy anyway and free it afterwards).
If your C code needs to access the memory after the call returned, you have to make explicitly a copy, eg with ffi.new("char[]", mybytes), and keep it alive as long as needed.