phpcphp-internals

What is causing this strange memory corruption in my test PHP extension?


I recently needed a list of compiled-in signal names so I could print nice messages like "Interrupted by SIGINT (2)".

get_defined_constants() is unusable for this as it jumbles SIGINT, SIGTRAP etc in amongst totally unrelated definitions (with the same integer values).

The signal names map to different values depending on OS, and sometimes they're not all compiled in to PHP, so the most straightforward clean solution would be a new function that just returns an array of compiled-in signal names.

Hmm... a function that returns a static array back to PHP userspace... that sounds like a really good first sourcecode-hacking project, right?

Nope :)


The code below (a bit further down) is a super-minimized testcase that illustrates the very strange brick wall I've crashed into.

I have a GINIT function initializing an extension global test_array as an array, which I then fill with some entries (exactly like my changes to pcntl would do) with add_assoc_long() (in this case using sprintf() to generate dummy strings for the array keys like !!!, """, ###, etc).

I then have a demo function test_test1() that ZVAL_COPYs the pre-built test_array to return_value.

Drumroll please; behold what happens when I try and print_r() the result:

Array
(
    [PWD] => 0
    [i336] => 1
    [LOGNAME] => 2
    [tty] => 3
    [HOME] => 4
    [LANG] => 5
    [user] => 6
    [xterm] => 7
    [TERM] => 8
    [i336] => 9
    [USER] => 10
    [:0] => 11
    [DISPLAY] => 12
    [SHLVL] => 13
    [9:22836] => 14
    [PATH] => 15
    [111] => 16
    [222] => 17
    [333] => 18
    [444] => 19
    [555] => 20
    [666] => 21
    [777] => 22
    [888] => 23
    [999] => 24
    [HG] => 25
    [MAIL] => 26
    [OLDPWD] => 27
    [] => 28
    [] => 29
    [] => 30
    [STDIN] => 31
    [STDOUT] => 32
    [STDERR] => 33
    [print_r] => 34
    [DDD] => 35
    [EEE] => 36
    [FFF] => 37
    [GGG] => 38
    [HHH] => 39
    [III] => 40
    [JJJ] => 41
    [KKK] => 42
    [LLL] => 43
    [MMM] => 44
    [NNN] => 45
    [OOO] => 46
    [PPP] => 47
    [QQQ] => 48
    [RRR] => 49
<<snipped>>

What's really weird is that entries 0 to 15 are corrupt; entries 16 to 24 are fine; entries 25 to 34 are corrupt; entries 35 on are fine.

0-15 / 16-24 makes a weird kind of sense; 25-34 / 35-∞ does not.

In any case, if I replace test_test1 with the following (slight modification of the code from the GINIT function):

    zval test;
    array_init(&test);

    for (int i = 0; i < 80; i++) {
        char buf[4];
        sprintf(buf, "%1$c%1$c%1$c", i+33);
        add_assoc_long(&test, buf, i);
    }

    ZVAL_COPY_OR_DUP(return_value, &test);

    zval_ptr_dtor(&test);

I get the somewhat more expected

(
    [!!!] => 0
    ["""] => 1
    [###] => 2
    [$$$] => 3
    [%%%] => 4
    [&&&] => 5
    ['''] => 6
    [(((] => 7
    [)))] => 8
    [***] => 9
    [+++] => 10
    [,,,] => 11
    [---] => 12
    [...] => 13
    [///] => 14
    [000] => 15
    [111] => 16
    [222] => 17
    [333] => 18
    [444] => 19
    [555] => 20
    [666] => 21
    [777] => 22
    [888] => 23
    [999] => 24
    [:::] => 25
    [;;;] => 26
    [<<<] => 27
    [===] => 28
<<snipped>>

Besides some hints about what I'm doing wrong (I know I've got something backwards... :) ), I would very much like to understand why PHP is dumping portions of what appears to be random environment variables into my array!


The main reason I've halted my own exploration/solving process and posted this question is my awareness that I don't know what I don't know, combined with the fact that I've no idea where to turn to try to resolve this.

There are an increasing number of resources offering PHP documentation, but unfortunately figuring out how to do simple tasks seems to require a lot of piecing-together of details from disparate sources (I'm stuck on something that honestly seems quite simple on the surface).

I also have questions about how up-to-date what I'm reading actually is.

An example: The ZEND_MODULE_GLOBALS_ACCESSOR() macro, used for thread-safely accessing per-module global values, is used 37 times (looks like by just under half the contents of ext/). And yet, all of the information I have read, including on the sites like phpinternals.net and phpinternalsbook.net, specifies a hard-requirement of including a certain 5-line #define in order to set up access to module globals. I stumbled on the aforementioned macro, which implements the #define in PHP itself so nobody has to do it by themselves anymore, by reading the source code.

I can completely accept that things aren't in exact sync - and that maybe that macro is new.

But where do I go for updated reference information that answers the questions I have?

Genuine question.


I've included config.m4 below, so this could be compiled for testing:

php_test.h:

#ifndef PHP_TEST_H
# define PHP_TEST_H

extern zend_module_entry test_module_entry;
# define phpext_test_ptr &test_module_entry

# define PHP_TEST_VERSION "0.1.0"

ZEND_BEGIN_MODULE_GLOBALS(test)
    zval test_array;
ZEND_END_MODULE_GLOBALS(test)

# if defined(ZTS) && defined(COMPILE_DL_TEST)
ZEND_TSRMLS_CACHE_EXTERN()
# endif


ZEND_DECLARE_MODULE_GLOBALS(test)

#endif  /* PHP_TEST_H */

test.c:

#ifdef HAVE_CONFIG_H
# include "config.h"
#endif

#include "php.h"
#include "ext/standard/info.h"
#include "php_test.h"

PHP_FUNCTION(test_test1)
{
    ZVAL_COPY(return_value, &ZEND_MODULE_GLOBALS_ACCESSOR(test, test_array));   
}

PHP_RINIT_FUNCTION(test)
{
#if defined(ZTS) && defined(COMPILE_DL_TEST)
    ZEND_TSRMLS_CACHE_UPDATE();
#endif

    return SUCCESS;
}

PHP_MINIT_FUNCTION(test)
{
    return SUCCESS;
}


PHP_GSHUTDOWN_FUNCTION(test)
{ }

PHP_GINIT_FUNCTION(test)
{

    // Thanks to #php.pecl on efnet for pointing me in the direction of `GINIT`.
    // I'd seriously hit my SIGSEGV limit, and really appreciated the valid pointers (punintended).

    array_init(&ZEND_MODULE_GLOBALS_ACCESSOR(test, test_array));

    for (int i = 0; i < 80; i++) {
        char buf[4];
        sprintf(buf, "%1$c%1$c%1$c", i+33);
        add_assoc_long(&ZEND_MODULE_GLOBALS_ACCESSOR(test, test_array), buf, i);
    }

    return SUCCESS;

}

PHP_MINFO_FUNCTION(test)
{
    php_info_print_table_start();
    php_info_print_table_header(2, "test support", "enabled");
    php_info_print_table_end();
}

ZEND_BEGIN_ARG_INFO(arginfo_test_test1, 0)
ZEND_END_ARG_INFO()

ZEND_BEGIN_ARG_INFO(arginfo_test_test2, 0)
    ZEND_ARG_INFO(0, str)
ZEND_END_ARG_INFO()

static const zend_function_entry test_functions[] = {
    PHP_FE(test_test1, arginfo_test_test1)
    PHP_FE_END
};

zend_module_entry test_module_entry = {
    STANDARD_MODULE_HEADER,
    "test",                     /* Extension name */
    test_functions,             /* zend_function_entry */
    PHP_MINIT(test),            /* PHP_MINIT - Module initialization */
    NULL,                       /* PHP_MSHUTDOWN - Module shutdown */
    PHP_RINIT(test),            /* PHP_RINIT - Request initialization */
    NULL,                       /* PHP_RSHUTDOWN - Request shutdown */
    PHP_MINFO(test),            /* PHP_MINFO - Module info */
    PHP_TEST_VERSION,           /* Version */
    PHP_MODULE_GLOBALS(test),
    PHP_GINIT(test),
    PHP_GSHUTDOWN(test),
    NULL,                       /* PRSHUTDOWN() */
    STANDARD_MODULE_PROPERTIES_EX
};

#ifdef COMPILE_DL_TEST
# ifdef ZTS
ZEND_TSRMLS_CACHE_DEFINE()
# endif
ZEND_GET_MODULE(test)
#endif

config.m4:

PHP_ARG_ENABLE([test2],
  [whether to enable test2 support],
  [AS_HELP_STRING([--enable-test2],
    [Enable test2 support])],
  [no])

if test "$PHP_TEST2" != "no"; then
  AC_DEFINE(HAVE_TEST2, 1, [ Have test2 support ])

  PHP_NEW_EXTENSION(test2, test2.c, $ext_shared)
fi

Solution

  • GINIT is invoked prior to request startup. array_init() and add_assoc_long() (and most other APIs) use the per-request allocator.

    You could use persistent allocations instead (by using lower-level zend_hash and zend_string APIs and passing persistent=1 flags), but you still wouldn't be allowed to return such an array from a PHP function, because this violates the PHP memory model (you are not permitted to change the refcount of a persistent value during a request).

    If you want to place a value using the per-request allocator inside a global, you need to do so inside RINIT (and then destroy inside RSHUTDOWN). These handlers are invoked as part of each request.

    Though for your particular use-case I would recommend not using globals at all, and instead simply constructing the array anew each time the function is called. It is not performance-critical.