cthread-safetytclcoredumppanic

alloc: invalid block - Are Tcl_IncrRefCount and Tcl_DecrRefCount thread safe for threaded Tcl / 1 interp per thread?


Our 32-bit server application statically embeds tcl 8.4.11. On Red Hat Linux 6.5 64-bit we're encountering crashes / core dumps. The failure looks like:

alloc: invalid block: 0xf6f00f58: 88 f6 0

At the bottom of the question, I've documented two different core dumps we've seen.

We've isolated a potential root cause to a TCL object shared between two threads concurrently running separate TCL interpreter instances. We think it's because TCL object is passed to Tcl_IncrRefCount / Tcl_DecrRefCount from these concurrently executing TCL interpreters.

  1. Are Tcl_IncrRefCount / Tcl_DecrRefCount thread safe when TCL is compiled threaded?
  2. Are TCL objects shared by TCL interpreter instances? Is there any way to disable TCL object sharing across interpreter instances?
  3. Is the situation any better in TCL version 8.6.3?
(gdb) bt
#0  __kernel_vsyscall () at arch/x86/vdso/vdso32/sysenter.S:49
#1  0x001b7871 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0x001b914a in abort () at abort.c:92
#3  0x080f611c in Tcl_PanicVA ()
#4  0x080f613b in Tcl_Panic ()
#5  0x0810133c in Ptr2Block ()
#6  0x08100e04 in TclpFree ()
#7  0x080b46a7 in Tcl_Free ()
#8  0x08100686 in FreeStringInternalRep ()
#9  0x080fdac1 in ResetObjResult ()
#10 0x080fd316 in Tcl_GetStringResult ()
#11 0x0808aaad in run_tcl_proc (pDevice=0x8e0ba08, pInterp=0x8d798c0, iNumArgs=2, objv=0x115434c, bIsCommand=0 '\000', pCommand=0x0)
#12 0x08093672 in Tcl_begin_next_state (pDevice=0x8e0ba08, iNextState=RunPoll, pCommand=0x0)
#13 0x08093759 in Tcl_port_thread (dummy=0x8d1cab8)
#14 0x008bcb39 in start_thread (arg=0x1154b70) at pthread_create.c:301
#15 0x0026fc2e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:133
(gdb)
(gdb) bt
#0  __kernel_vsyscall () at arch/x86/vdso/vdso32/sysenter.S:49
#1  0x00395871 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0x0039714a in abort () at abort.c:92
#3  0x080f611c in Tcl_PanicVA ()
#4  0x080f613b in Tcl_Panic ()
#5  0x0810133c in Ptr2Block ()
#6  0x08100e04 in TclpFree ()
#7  0x080b46a7 in Tcl_Free ()
#8  0x080d21b6 in TclExecuteByteCode ()
#9  0x080d1bc1 in TclCompEvalObj ()
#10 0x080fbd5c in TclObjInterpProc ()
#11 0x080b026a in TclEvalObjvInternal ()
#12 0x080d2716 in TclExecuteByteCode ()
#13 0x080d1bc1 in TclCompEvalObj ()
#14 0x080fbd5c in TclObjInterpProc ()
#15 0x080b026a in TclEvalObjvInternal ()
#16 0x080b0517 in Tcl_EvalObjv ()
#17 0x0808aa02 in run_tcl_proc (pDevice=0x94a2500, pInterp=0xac2bba0, iNumArgs=2, objv=0x11b034c, bIsCommand=0 '\000', pCommand=0x0)
#18 0x08093672 in Tcl_begin_next_state (pDevice=0x94a2500, iNextState=RunPoll, pCommand=0x0)
#19 0x08093759 in Tcl_port_thread (dummy=0x9365e98)
#20 0x00356b39 in start_thread (arg=0x11b0b70) at pthread_create.c:301
#21 0x0044dc2e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:133
(gdb)

Solution

  • The calls Tcl_IncrRefCount (actually a simple macro) and Tcl_DecrRefCount (a complicated macro) are sort of thread safe, but only because each Tcl_Obj should only ever be accessed from the thread that created it; parallel calls to T_IRC and T_DRC are fine, so long as they're on different values. The plus side of this is that accesses don't need locking (and the memory manager for Tcl_Obj structures takes advantage of this).

    Note that multi-threaded access is not a good plan at all unless you're very careful, since even reader operations like Tcl_GetIntFromObj can write to the underlying structure if a type transformation needs to be applied. These operations are not locked. Doing it at all needs very intimate knowledge of the current type of the value — not something that you're usually encouraged to think about in Tcl in the first place, though tcl::unsupported::representation can be helpful with probing this in 8.6 — and some very careful interlocking between the threads so that one isn't writing while the other is peeking. Don't do this at all, while not 100% accurate, is the approach least likely to lead to headaches.

    You probably ought to read more about how you're supposed to do it. The ActiveState blog has a reasonable introduction.