I tried implementing tree sitter support in Python using multiprocessing but the Python process is crashing. Further below is a Minimum Reproducible Example.
My needs are to be able to send the language data through a Queue to another Process. I cannot just import it on the other side as the first process is an API the user interacts with and the second process is meant to do more computationally expensive items to not clog the main thread. Because of this the user needs to be able to use any given tree sitter grammar they find lying around and I need to be able to work with it.
Since tree_sitter.Language
is not serializable, I need to do it through the c pointer address given from some_tree_sitter_language.language()
(print it and you get an int that corresponds to the memory address (see here). Reasonably, or so I thought, I decided to send the memory address to the other Process through a shared Queue as my program already does that. It was at that point I learned that the issue with just sending the address is that the memory address doesn't line up since it's a, you guessed it, a separate process.
I hear your cries to try to remind me of multiprocessing.shared_memory.SharedMemory()
but unfortunately that requires you to provide the size in bytes needed and as Python so helpfully told me:
>>> from tree_sitter_python import language
>>> from tree_sitter import Language
>>> from ctypes import sizeof
>>> sizeof(Language(language()))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: this type has no size
If anyone has any idea or solutions to get the memory address to be shared between the two Processes (that also works on Linux, Mac, and Windows) please let me know. Thank you for your time!
MRE:
from multiprocessing import Queue, Process
from tree_sitter_python import language
from tree_sitter import Language
from time import sleep
def test(q: Queue):
x= q.get()
print("BEFORE")
other_side_lang: Language = Language(x)
print("AFTER", other_side_lang)
if __name__ == "__main__":
q = Queue()
q.put(language())
x = Process(
target=test,
args=(q,),
daemon=True
)
x.start()
sleep(3)
Attached is a partial copy of one of the crash logs:
-------------------------------------
Translated Report (Full Report Below)
-------------------------------------
Process: Python [639]
Path: /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python
Identifier: org.python.python
Version: 3.12.4 (3.12.4)
Code Type: X86-64 (Native)
Parent Process: Python [633]
Responsible: Terminal [441]
User ID: 501
Date/Time: 2024-07-11 02:37:55.4483 -0600
OS Version: macOS 13.6.1 (22G313)
Report Version: 12
Bridge OS Version: 8.1 (21P1069)
Anonymous UUID: 428C86B5-033D-A2E7-748C-6F0ECD68C1FD
Time Awake Since Boot: 26 seconds
System Integrity Protection: enabled
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000106e06040
Exception Codes: 0x0000000000000001, 0x0000000106e06040
Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process: exc handler [639]
VM Region Info: 0x106e06040 is not in any region. Bytes before following region: 132186048
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
UNUSED SPACE AT START
--->
__TEXT 10ec16000-10ec1a000 [ 16K] r-x/r-x SM=COW .../MacOS/Python
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 _binding.cpython-312-darwin.so 0x10f1cb004 ts_language_version + 4
1 _binding.cpython-312-darwin.so 0x10f1c258c language_init + 92
2 Python 0x10f2ed918 type_call + 135
3 Python 0x10f283e62 _PyObject_MakeTpCall + 140
4 Python 0x10f37a005 _PyEval_EvalFrameDefault + 51128
5 Python 0x10f36d691 PyEval_EvalCode + 197
6 Python 0x10f3d3e48 run_eval_code_obj.llvm.14544499641581592508 + 83
7 Python 0x10f3d20e7 run_mod.llvm.14544499641581592508 + 107
8 Python 0x10f3d1888 PyRun_StringFlags + 113
9 Python 0x10f3d17d0 PyRun_SimpleStringFlags + 68
10 Python 0x10f3f4361 Py_RunMain + 714
11 Python 0x10f3f49b0 pymain_main + 378
12 Python 0x10f3f4a63 Py_BytesMain + 42
13 dyld 0x7ff80164c41f start + 1903
Thread 0 crashed with X86 Thread State (64-bit):
rax: 0x0000000106e06040 rbx: 0x00000000ffffffff rcx: 0x0000000106e06040 rdx: 0x0000000000000004
rdi: 0x0000000106e06040 rsi: 0x0000000106e06040 rbp: 0x00007ff7b12e8b20 rsp: 0x00007ff7b12e8b20
r8: 0x000000010f4c4500 r9: 0x00007ff7b12e8910 r10: 0x00007ff7b12e87f0 r11: 0x0000000000000100
r12: 0x0000000000000000 r13: 0x000000010f7221c8 r14: 0x000000010ef5bb30 r15: 0x000000010f0ce710
rip: 0x000000010f1cb004 rfl: 0x0000000000010202 cr2: 0x0000000106e06040
Logical CPU: 2
Error Code: 0x00000004 (no mapping for user data read)
Trap Number: 14
Binary Images:
0x10ec16000 - 0x10ec19fff org.python.python (3.12.4) <3079272b-e686-3efd-8d24-c01f62fdb7c2> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python
0x10f224000 - 0x10f54ffff org.python.python (3.12.4, (c) 2001-2023 Python Software Foundation.) <17e72ebd-32f4-3ba9-b4db-906f26230882> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/Python
0x10ef9b000 - 0x10efa2fff _struct.cpython-312-darwin.so (*) <4a0e6449-166d-3ce3-a72c-f3808ef391d4> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_struct.cpython-312-darwin.so
0x10efd4000 - 0x10efebfff _pickle.cpython-312-darwin.so (*) <c6d9e37d-5eb6-3db6-89d7-cd82ed1a467d> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_pickle.cpython-312-darwin.so
0x10effc000 - 0x10f00bfff _socket.cpython-312-darwin.so (*) <432b133c-3887-3890-b371-cac14ab461df> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_socket.cpython-312-darwin.so
0x10f018000 - 0x10f023fff math.cpython-312-darwin.so (*) <5ebc3b1b-4a5f-3fc5-8078-6760fbd4d083> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/math.cpython-312-darwin.so
0x10efaf000 - 0x10efb6fff select.cpython-312-darwin.so (*) <8b0d9d36-b536-33b7-a864-ee2ecce437a9> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/select.cpython-312-darwin.so
0x10f044000 - 0x10f04bfff array.cpython-312-darwin.so (*) <83c7e987-75d3-3e37-bf34-c7648fbe9788> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/array.cpython-312-darwin.so
0x10efc3000 - 0x10efc6fff fcntl.cpython-312-darwin.so (*) <4cea1d2a-2394-31c6-93db-2c96d9d471b3> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/fcntl.cpython-312-darwin.so
0x10f030000 - 0x10f033fff _posixsubprocess.cpython-312-darwin.so (*) <a5e7a342-0997-3c1e-9cbf-dc756a36ebb0> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_posixsubprocess.cpython-312-darwin.so
0x10f058000 - 0x10f05bfff _multiprocessing.cpython-312-darwin.so (*) <a940b46c-f9f3-3147-8994-33b0ba07f09e> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_multiprocessing.cpython-312-darwin.so
0x10f068000 - 0x10f06bfff _posixshmem.cpython-312-darwin.so (*) <13aad305-b26b-3ad8-b21f-48a89ea8358a> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_posixshmem.cpython-312-darwin.so
0x10f7e8000 - 0x10f85ffff _binding.abi3.so (*) <0e713bee-dfd3-37e7-8812-a066a65b0f33> /usr/local/lib/python3.12/site-packages/tree_sitter_python/_binding.abi3.so
0x10f1c0000 - 0x10f1f3fff _binding.cpython-312-darwin.so (*) <dae6b362-3e05-3cdc-b2eb-aaef397ef712> /usr/local/lib/python3.12/site-packages/tree_sitter/_binding.cpython-312-darwin.so
0x10f178000 - 0x10f17bfff _heapq.cpython-312-darwin.so (*) <682fe7fb-3ffb-3cb3-82a9-fdd75c511b40> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_heapq.cpython-312-darwin.so
0x10f188000 - 0x10f18bfff _queue.cpython-312-darwin.so (*) <b569f10c-ca95-3473-b27e-3a2429d1c4dd> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_queue.cpython-312-darwin.so
0x10f198000 - 0x10f19ffff zlib.cpython-312-darwin.so (*) <43492c33-fe03-34d7-b369-8fc8adaabf29> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/zlib.cpython-312-darwin.so
0x10f1ac000 - 0x10f1affff _bz2.cpython-312-darwin.so (*) <a93b8231-452d-3dcb-8a41-25fc609a5022> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_bz2.cpython-312-darwin.so
0x10f208000 - 0x10f20ffff _lzma.cpython-312-darwin.so (*) <36ffa772-1c73-310c-84e5-c18e71179618> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_lzma.cpython-312-darwin.so
0x10f891000 - 0x10f8b0fff liblzma.5.dylib (*) <74cc93b3-d104-3692-ab7a-7348e6fc3cdc> /usr/local/Cellar/xz/5.4.6/lib/liblzma.5.dylib
0x10f868000 - 0x10f86bfff _bisect.cpython-312-darwin.so (*) <cb668a54-c2d6-3b89-aebb-90eb973e11bc> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_bisect.cpython-312-darwin.so
0x10f878000 - 0x10f87bfff _random.cpython-312-darwin.so (*) <f1e00bab-9951-3ed7-b9d8-64fc054ffa41> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_random.cpython-312-darwin.so
0x10f8d3000 - 0x10f8defff _sha2.cpython-312-darwin.so (*) <ddfdfed3-159b-34b5-84a5-95ba92ca83ae> /usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_sha2.cpython-312-darwin.so
0x7ff801646000 - 0x7ff8016de5ef dyld (*) <3df96f32-b9c9-3566-a6b7-4daebc6d6563> /usr/lib/dyld
0x0 - 0xffffffffffffffff ??? (*) <00000000-0000-0000-0000-000000000000> ???
Because the memory is specific to the Process I can't send the address but functions are apparently serializable in Python so instead of sending the c pointer from language() I can just send the function itself:
from multiprocessing import Queue, Process
from tree_sitter_python import language
from tree_sitter import Language
from time import sleep
def test(q: Queue):
x= q.get()
print("BEFORE")
other_side_lang: Language = Language(x())
print("AFTER", other_side_lang)
if __name__ == "__main__":
q = Queue()
q.put(language)
x = Process(
target=test,
args=(q,),
daemon=True
)
x.start()
sleep(3)