phpfpmopcachephp-deployer

What is causing a segfault when Opcache file cache is enabled in a symlink deployment?


I'm using PHP deployer which is a symlink based deployment tool which calls opcache:reset after deployment.

Recently I'm getting a segfault in my PHP-FPM processes after deployment.

This manifests as segfault entires in the PHP logs/memory exhausted warnings being output or 500/503 errors from Apache.

This has worked fine for years, so I'm stumped as to what has caused this to happen now.

I can reproduce this on every page refresh of an affected pool after the first segfault.

If I restart the PHP-FPM process, the page is served correctly the first time and then subsequently I get a segfault.

I suspect this is related to Opcache, since if I delete the Opcache cache files and restart PHP-FPM, the problem goes away. If I disable opcache.file_cache, again there is no problem. Note that I'm not exclusively using the file cache.

When I (unscientifically) checked the opcache version of files reported by PHP during the memory exhaustion error, I noticed they seemed to be truncated or missing data that existed in previous versions of the opcache file where the segfault did not occur. When I removed those specific opcache files and restarted PHP-FPM the problem went away. So it seems, I think, that the opcache files are getting corrupted between deploys.

Environment:

My opcache config:

zend_extension=opcache.so
opcache.enable=1
opcache.enable_cli=0
opcache.memory_consumption=96
opcache.interned_strings_buffer=16
opcache.max_accelerated_files=4096
opcache.max_wasted_percentage=5
opcache.validate_timestamps=1
opcache.revalidate_path=0
opcache.revalidate_freq=2
opcache.max_file_size=0

The core dump:

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/php-fpm...Reading symbols from /usr/lib/debug/usr/sbin/php-fpm.debug...done.
done.
[New LWP 6221]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `php-fpm: pool cpsastag         '.
Program terminated with signal 11, Segmentation fault.
#0  ZEND_BIND_STATIC_SPEC_CV_UNUSED_HANDLER () at /usr/src/debug/php-7.4.15/Zend/zend_vm_execute.h:46883
46883           zval *value;
(gdb) bt
#0  ZEND_BIND_STATIC_SPEC_CV_UNUSED_HANDLER () at /usr/src/debug/php-7.4.15/Zend/zend_vm_execute.h:46883
#1  0x00005559a7261159 in execute_ex (ex=0x0) at /usr/src/debug/php-7.4.15/Zend/zend_vm_execute.h:57657
#2  0x00005559a7266930 in zend_execute (op_array=0x7f03fb075000, return_value=<optimized out>) at /usr/src/debug/php-7.4.15/Zend/zend_vm_execute.h:57957
#3  0x00005559a71e1a63 in zend_execute_scripts (type=type@entry=8, retval=0x7f03fb016330, retval@entry=0x0, file_count=file_count@entry=3) at /usr/src/debug/php-7.4.15/Zend/zend.c:1679
#4  0x00005559a717fd00 in php_execute_script (primary_file=primary_file@entry=0x7ffd44c9b940) at /usr/src/debug/php-7.4.15/main/main.c:2621
#5  0x00005559a6fe582b in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/php-7.4.15/sapi/fpm/fpm/fpm_main.c:1939
(gdb)

SELinux audit.log

type=ANOM_ABEND msg=audit(1614726077.349:20549): auid=4294967295 uid=1010 gid=1011 ses=4294967295 subj=system_u:system_r:httpd_t:s0 pid=825 comm="php-fpm" reason="memory violation" sig=11

I'd really appreciate any pointers on what could be wrong or how to debug further.


Solution

  • Overview

    Our issue was not involving a symlink, but otherwise had similar errors and server configuration.

    The segfault appears related to PHP's opcache files (opcache.file_cache) not being properly cleared, or else not syncing with the in-memory RAM opcache.

    In our case, the opcache reset was triggered immediately after upgrading to Wordpress 6.3.1 and doing a 'Purge all Caches' with W3 Total Cache 2.4.1.

    We do have SELinux enabled, and that's always a possible culprit, but this worked for years with SELinux, so it's unclear if that's contributing or if its logs are just capturing a 'memory violation' from a crash on a file it's monitoring.

    Solution

    Turning off the secondary opcache file-based caching for all sites / PHP-FPM pools by commenting out the opcache.file_cache in each active pool configuration file in /etc/php-fpm.d/. Then, reloading PHP-FPM (and Apache for good measure).

    Secondary file-based opcache (via opcache.file_cache) is not necessary for most smaller servers, so this may be an acceptable solution / fix. You lose the benefits of the file caching, which I believe is mainly for quicker startups on server or service restarts.

    Workarounds

    Symptoms

    Server Configuration

    Diagnosing

    Log Checking Examples

    Notes