I'm using Apache v2.4.56 on Solaris, with libapr 1.6.2. When starting it, it spawns 5 httpd processes. Now from time to time the server get locked and is not accepting requests anymore. It's even impossible to connect to port 80.
I did a core dump of each of the 5 processes and found out that two of them seems to be stuck in the same function write(). I suppose they are writing events in the log file but I'm not sure. Here is how I configured the log:
ErrorLog /data/log/HTTP.apache.log
LogLevel info
<IfModule log_config_module>
LogFormat "%h %f %t \t%I \t%O \t%D" common
CustomLog "|/usr/bin/logger -tHTTP -pdaemon.info" common
</IfModule>
Here is an example of the callstack of the first process that seems dead locked:
------------ lwp# 6 / thread# 6 ---------------
ffffffff730cb048 write (b, ffffffff7e75b9ff, 1)
ffffffff57d28ca8 apr_file_write (0x100267338?, 0xffffffff7e75b9ff?, 0xffffffff7e75b9e0?, 0x0?, 0x0?, 0x0?) + 1c8
ffffffff57d28ff4 apr_file_putc (0x1?, 0x100267338?, 0x0?, 0x0?, 0x1?, 0x0?) + 1c
ffffffff57d392c8 apr_pollset_wakeup (0x100266a50?, 0x0?, 0x0?, 0x0?, 0x100266a18?, 0x0?) + 20
00000001000b4888 TO_QUEUE_APPEND (0x100221228?, 0x10032b700?, 0x0?, 0x0?, 0x100221258?, 0x10032b7b0?) + f0
00000001000b6f5c process_socket (0x1002675c0?, 0x10032b378?, 0x10032b400?, 0x0?, 0x1?, 0x3?) + 8bc
00000001000ba14c worker_thread (0x1002675c0?, 0x100215af0?, 0xffffffff7f582240?, 0xfffc00?, 0x1?, 0x3?) + 494
ffffffff57d407a0 dummy_worker (0x1002675c0?, 0x0?, 0x1?, 0xffffffff57d40788?, 0x0?, 0x0?) + 18
ffffffff730c58b4 _lwp_start (0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
The stack of the second process is exactly the same.
What surprises me is that in libapr's source code for apr_file_write(), there seem to be no locking mechanism when calling the write() function.
Do you have any idea what could cause the deadlock ? Is the problem comming from the config that use CustomLog with a pipe maybe ?
The backtrace is not related to logging, it's about a worker (request processing thread) telling the dedicated listener thread there is new activity it needs to be polling for.
I'd get off APR 1.6.2: https://bz.apache.org/bugzilla/show_bug.cgi?id=68830
The problem is that the write happens over a pipe, so if the other side stops reading it the write in the worker thread will block.