pythonpointersmultiprocessingsegmentation-fault

Multiprocessing sharing c pointer in Python


I tried implementing tree sitter support in Python using multiprocessing but the Python process is crashing. Further below is a Minimum Reproducible Example.

My needs are to be able to send the language data through a Queue to another Process. I cannot just import it on the other side as the first process is an API the user interacts with and the second process is meant to do more computationally expensive items to not clog the main thread. Because of this the user needs to be able to use any given tree sitter grammar they find lying around and I need to be able to work with it.

Since tree_sitter.Language is not serializable, I need to do it through the c pointer address given from some_tree_sitter_language.language() (print it and you get an int that corresponds to the memory address (see here). Reasonably, or so I thought, I decided to send the memory address to the other Process through a shared Queue as my program already does that. It was at that point I learned that the issue with just sending the address is that the memory address doesn't line up since it's a, you guessed it, a separate process.

I hear your cries to try to remind me of multiprocessing.shared_memory.SharedMemory() but unfortunately that requires you to provide the size in bytes needed and as Python so helpfully told me:

>>> from tree_sitter_python import language
>>> from tree_sitter import Language
>>> from ctypes import sizeof
>>> sizeof(Language(language()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: this type has no size

If anyone has any idea or solutions to get the memory address to be shared between the two Processes (that also works on Linux, Mac, and Windows) please let me know. Thank you for your time!

MRE:

from multiprocessing import Queue, Process
from tree_sitter_python import language
from tree_sitter import Language
from time import sleep

def test(q: Queue):
    x= q.get()
    print("BEFORE")
    other_side_lang: Language = Language(x)
    print("AFTER", other_side_lang)

if __name__ == "__main__":
    q = Queue()
    q.put(language())

    x = Process(
        target=test,
        args=(q,),
        daemon=True
    )
    x.start()
    sleep(3)

Attached is a partial copy of one of the crash logs:

-------------------------------------
Translated Report (Full Report Below)
-------------------------------------

Process:               Python [639]
Path:                  /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python
Identifier:            org.python.python
Version:               3.12.4 (3.12.4)
Code Type:             X86-64 (Native)
Parent Process:        Python [633]
Responsible:           Terminal [441]
User ID:               501

Date/Time:             2024-07-11 02:37:55.4483 -0600
OS Version:            macOS 13.6.1 (22G313)
Report Version:        12
Bridge OS Version:     8.1 (21P1069)
Anonymous UUID:        428C86B5-033D-A2E7-748C-6F0ECD68C1FD


Time Awake Since Boot: 26 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000106e06040
Exception Codes:       0x0000000000000001, 0x0000000106e06040

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [639]

VM Region Info: 0x106e06040 is not in any region.  Bytes before following region: 132186048
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                      10ec16000-10ec1a000    [   16K] r-x/r-x SM=COW  .../MacOS/Python

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   _binding.cpython-312-darwin.so         0x10f1cb004 ts_language_version + 4
1   _binding.cpython-312-darwin.so         0x10f1c258c language_init + 92
2   Python                                 0x10f2ed918 type_call + 135
3   Python                                 0x10f283e62 _PyObject_MakeTpCall + 140
4   Python                                 0x10f37a005 _PyEval_EvalFrameDefault + 51128
5   Python                                 0x10f36d691 PyEval_EvalCode + 197
6   Python                                 0x10f3d3e48 run_eval_code_obj.llvm.14544499641581592508 + 83
7   Python                                 0x10f3d20e7 run_mod.llvm.14544499641581592508 + 107
8   Python                                 0x10f3d1888 PyRun_StringFlags + 113
9   Python                                 0x10f3d17d0 PyRun_SimpleStringFlags + 68
10  Python                                 0x10f3f4361 Py_RunMain + 714
11  Python                                 0x10f3f49b0 pymain_main + 378
12  Python                                 0x10f3f4a63 Py_BytesMain + 42
13  dyld                                0x7ff80164c41f start + 1903


Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000106e06040  rbx: 0x00000000ffffffff  rcx: 0x0000000106e06040  rdx: 0x0000000000000004
  rdi: 0x0000000106e06040  rsi: 0x0000000106e06040  rbp: 0x00007ff7b12e8b20  rsp: 0x00007ff7b12e8b20
   r8: 0x000000010f4c4500   r9: 0x00007ff7b12e8910  r10: 0x00007ff7b12e87f0  r11: 0x0000000000000100
  r12: 0x0000000000000000  r13: 0x000000010f7221c8  r14: 0x000000010ef5bb30  r15: 0x000000010f0ce710
  rip: 0x000000010f1cb004  rfl: 0x0000000000010202  cr2: 0x0000000106e06040
  
Logical CPU:     2
Error Code:      0x00000004 (no mapping for user data read)
Trap Number:     14

Binary Images:
       0x10ec16000 -        0x10ec19fff org.python.python (3.12.4) <3079272b-e686-3efd-8d24-c01f62fdb7c2> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python
       0x10f224000 -        0x10f54ffff org.python.python (3.12.4, (c) 2001-2023 Python Software Foundation.) <17e72ebd-32f4-3ba9-b4db-906f26230882> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/Python
       0x10ef9b000 -        0x10efa2fff _struct.cpython-312-darwin.so (*) <4a0e6449-166d-3ce3-a72c-f3808ef391d4> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_struct.cpython-312-darwin.so
       0x10efd4000 -        0x10efebfff _pickle.cpython-312-darwin.so (*) <c6d9e37d-5eb6-3db6-89d7-cd82ed1a467d> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_pickle.cpython-312-darwin.so
       0x10effc000 -        0x10f00bfff _socket.cpython-312-darwin.so (*) <432b133c-3887-3890-b371-cac14ab461df> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_socket.cpython-312-darwin.so
       0x10f018000 -        0x10f023fff math.cpython-312-darwin.so (*) <5ebc3b1b-4a5f-3fc5-8078-6760fbd4d083> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/math.cpython-312-darwin.so
       0x10efaf000 -        0x10efb6fff select.cpython-312-darwin.so (*) <8b0d9d36-b536-33b7-a864-ee2ecce437a9> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/select.cpython-312-darwin.so
       0x10f044000 -        0x10f04bfff array.cpython-312-darwin.so (*) <83c7e987-75d3-3e37-bf34-c7648fbe9788> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/array.cpython-312-darwin.so
       0x10efc3000 -        0x10efc6fff fcntl.cpython-312-darwin.so (*) <4cea1d2a-2394-31c6-93db-2c96d9d471b3> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/fcntl.cpython-312-darwin.so
       0x10f030000 -        0x10f033fff _posixsubprocess.cpython-312-darwin.so (*) <a5e7a342-0997-3c1e-9cbf-dc756a36ebb0> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_posixsubprocess.cpython-312-darwin.so
       0x10f058000 -        0x10f05bfff _multiprocessing.cpython-312-darwin.so (*) <a940b46c-f9f3-3147-8994-33b0ba07f09e> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_multiprocessing.cpython-312-darwin.so
       0x10f068000 -        0x10f06bfff _posixshmem.cpython-312-darwin.so (*) <13aad305-b26b-3ad8-b21f-48a89ea8358a> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_posixshmem.cpython-312-darwin.so
       0x10f7e8000 -        0x10f85ffff _binding.abi3.so (*) <0e713bee-dfd3-37e7-8812-a066a65b0f33> /usr/local/lib/python3.12/site-packages/tree_sitter_python/_binding.abi3.so
       0x10f1c0000 -        0x10f1f3fff _binding.cpython-312-darwin.so (*) <dae6b362-3e05-3cdc-b2eb-aaef397ef712> /usr/local/lib/python3.12/site-packages/tree_sitter/_binding.cpython-312-darwin.so
       0x10f178000 -        0x10f17bfff _heapq.cpython-312-darwin.so (*) <682fe7fb-3ffb-3cb3-82a9-fdd75c511b40> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_heapq.cpython-312-darwin.so
       0x10f188000 -        0x10f18bfff _queue.cpython-312-darwin.so (*) <b569f10c-ca95-3473-b27e-3a2429d1c4dd> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_queue.cpython-312-darwin.so
       0x10f198000 -        0x10f19ffff zlib.cpython-312-darwin.so (*) <43492c33-fe03-34d7-b369-8fc8adaabf29> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/zlib.cpython-312-darwin.so
       0x10f1ac000 -        0x10f1affff _bz2.cpython-312-darwin.so (*) <a93b8231-452d-3dcb-8a41-25fc609a5022> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_bz2.cpython-312-darwin.so
       0x10f208000 -        0x10f20ffff _lzma.cpython-312-darwin.so (*) <36ffa772-1c73-310c-84e5-c18e71179618> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_lzma.cpython-312-darwin.so
       0x10f891000 -        0x10f8b0fff liblzma.5.dylib (*) <74cc93b3-d104-3692-ab7a-7348e6fc3cdc> /usr/local/Cellar/xz/5.4.6/lib/liblzma.5.dylib
       0x10f868000 -        0x10f86bfff _bisect.cpython-312-darwin.so (*) <cb668a54-c2d6-3b89-aebb-90eb973e11bc> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_bisect.cpython-312-darwin.so
       0x10f878000 -        0x10f87bfff _random.cpython-312-darwin.so (*) <f1e00bab-9951-3ed7-b9d8-64fc054ffa41> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_random.cpython-312-darwin.so
       0x10f8d3000 -        0x10f8defff _sha2.cpython-312-darwin.so (*) <ddfdfed3-159b-34b5-84a5-95ba92ca83ae> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_sha2.cpython-312-darwin.so
    0x7ff801646000 -     0x7ff8016de5ef dyld (*) <3df96f32-b9c9-3566-a6b7-4daebc6d6563> /usr/lib/dyld
               0x0 - 0xffffffffffffffff ??? (*) <00000000-0000-0000-0000-000000000000> ???

Additionally attached is a photo: Photo


Solution

  • Because the memory is specific to the Process I can't send the address but functions are apparently serializable in Python so instead of sending the c pointer from language() I can just send the function itself:

    from multiprocessing import Queue, Process
    from tree_sitter_python import language
    from tree_sitter import Language
    from time import sleep
    
    def test(q: Queue):
        x= q.get()
        print("BEFORE")
        other_side_lang: Language = Language(x())
        print("AFTER", other_side_lang)
    
    if __name__ == "__main__":
        q = Queue()
        q.put(language)
    
        x = Process(
            target=test,
            args=(q,),
            daemon=True
        )
        x.start()
        sleep(3)