Our 32-bit server application statically embeds tcl 8.4.11. On Red Hat Linux 6.5 64-bit we're encountering crashes / core dumps. The failure looks like:
alloc: invalid block: 0xf6f00f58: 88 f6 0
At the bottom of the question, I've documented two different core dumps we've seen.
We've isolated a potential root cause to a TCL object shared between two threads concurrently running separate TCL interpreter instances. We think it's because TCL object is passed to Tcl_IncrRefCount / Tcl_DecrRefCount from these concurrently executing TCL interpreters.
(gdb) bt
#0 __kernel_vsyscall () at arch/x86/vdso/vdso32/sysenter.S:49
#1 0x001b7871 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0x001b914a in abort () at abort.c:92
#3 0x080f611c in Tcl_PanicVA ()
#4 0x080f613b in Tcl_Panic ()
#5 0x0810133c in Ptr2Block ()
#6 0x08100e04 in TclpFree ()
#7 0x080b46a7 in Tcl_Free ()
#8 0x08100686 in FreeStringInternalRep ()
#9 0x080fdac1 in ResetObjResult ()
#10 0x080fd316 in Tcl_GetStringResult ()
#11 0x0808aaad in run_tcl_proc (pDevice=0x8e0ba08, pInterp=0x8d798c0, iNumArgs=2, objv=0x115434c, bIsCommand=0 '\000', pCommand=0x0)
#12 0x08093672 in Tcl_begin_next_state (pDevice=0x8e0ba08, iNextState=RunPoll, pCommand=0x0)
#13 0x08093759 in Tcl_port_thread (dummy=0x8d1cab8)
#14 0x008bcb39 in start_thread (arg=0x1154b70) at pthread_create.c:301
#15 0x0026fc2e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:133
(gdb)
(gdb) bt
#0 __kernel_vsyscall () at arch/x86/vdso/vdso32/sysenter.S:49
#1 0x00395871 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0x0039714a in abort () at abort.c:92
#3 0x080f611c in Tcl_PanicVA ()
#4 0x080f613b in Tcl_Panic ()
#5 0x0810133c in Ptr2Block ()
#6 0x08100e04 in TclpFree ()
#7 0x080b46a7 in Tcl_Free ()
#8 0x080d21b6 in TclExecuteByteCode ()
#9 0x080d1bc1 in TclCompEvalObj ()
#10 0x080fbd5c in TclObjInterpProc ()
#11 0x080b026a in TclEvalObjvInternal ()
#12 0x080d2716 in TclExecuteByteCode ()
#13 0x080d1bc1 in TclCompEvalObj ()
#14 0x080fbd5c in TclObjInterpProc ()
#15 0x080b026a in TclEvalObjvInternal ()
#16 0x080b0517 in Tcl_EvalObjv ()
#17 0x0808aa02 in run_tcl_proc (pDevice=0x94a2500, pInterp=0xac2bba0, iNumArgs=2, objv=0x11b034c, bIsCommand=0 '\000', pCommand=0x0)
#18 0x08093672 in Tcl_begin_next_state (pDevice=0x94a2500, iNextState=RunPoll, pCommand=0x0)
#19 0x08093759 in Tcl_port_thread (dummy=0x9365e98)
#20 0x00356b39 in start_thread (arg=0x11b0b70) at pthread_create.c:301
#21 0x0044dc2e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:133
(gdb)
The calls Tcl_IncrRefCount
(actually a simple macro) and Tcl_DecrRefCount
(a complicated macro) are sort of thread safe, but only because each Tcl_Obj
should only ever be accessed from the thread that created it; parallel calls to T_IRC and T_DRC are fine, so long as they're on different values. The plus side of this is that accesses don't need locking (and the memory manager for Tcl_Obj
structures takes advantage of this).
Note that multi-threaded access is not a good plan at all unless you're very careful, since even reader operations like Tcl_GetIntFromObj
can write to the underlying structure if a type transformation needs to be applied. These operations are not locked. Doing it at all needs very intimate knowledge of the current type of the value — not something that you're usually encouraged to think about in Tcl in the first place, though tcl::unsupported::representation
can be helpful with probing this in 8.6 — and some very careful interlocking between the threads so that one isn't writing while the other is peeking. Don't do this at all, while not 100% accurate, is the approach least likely to lead to headaches.
You probably ought to read more about how you're supposed to do it. The ActiveState blog has a reasonable introduction.