I have a Python program that calls a shared library (libpq in this case) that itself calls malloc under the hood.
I want to be able to test (i.e. in unit tests) what happens when those calls to malloc fail (e.g. when there isn't enough memory).
How can I force that?
Note: I don't think setting a resource limit on the process using ulimit -d
would work. It would need to be be precise and robust enough to, say, make a single malloc call inside libpq, for example one inside PQconnectdbParams, to fail, but all others to work fine, across different versions of Python, and even different resource usages in the same version of Python.
It's possible, but it's tricky. In summary
You can override malloc
in a shared library, test_malloc_override.so
say, and then (on linux at least) using the LD_PRELOAD
environment variable to load it.
But... Python calls malloc all over the place, and you need those to succeed. To isolate the "right" calls to malloc to fail you can use the glibc functions "backtrace" and "backtrace_symbols" to inspect the stack to see if it's the right one to fail.
This shared library exposes a small API to control which calls to malloc will fail (so it doesn't need to be hard coded in the library)
To allow some calls to malloc to succeed, you need a pointer to the original malloc function. However, to find this you need to call dlsym, which itself can call malloc. So you need to build in a simple allocator inside the new malloc
so these calls (recursive) calls to malloc succeed. Thanks to https://stackoverflow.com/a/10008252/1319998 for this tip.
In more detail:
The shared library code
// In test_override_malloc.c
// Some of this code is inspired by https://stackoverflow.com/a/10008252/1319998
#define _GNU_SOURCE
#include <dlfcn.h>
#include <execinfo.h>
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
// Fails malloc at the fail_in-th call when search_string is in the backtrade
// -1 means never fail
static int fail_in = -1;
static char search_string[1024];
// To find the original address of malloc during malloc, we might
// dlsym will be called which might allocate memory via malloc
static char initialising_buffer[10240];
static int initialising_buffer_pos = 0;
// The pointers to original memory management functions to call
// when we don't want to fail
static void *(*original_malloc)(size_t) = NULL;
static void (*original_free)(void *ptr) = NULL;
void set_fail_in(int _fail_in, char *_search_string) {
fail_in = _fail_in;
strncpy(search_string, _search_string, sizeof(search_string));
}
void *
malloc(size_t size) {
void *memory = NULL;
int trace_size = 100;
void *stack[trace_size];
static int initialising = 0;
static int level = 0;
// Save original
if (!original_malloc) {
if (initialising) {
if (size + initialising_buffer_pos >= sizeof(initialising_buffer)) {
exit(1);
}
void *ptr = initialising_buffer + initialising_buffer_pos;
initialising_buffer_pos += size;
return ptr;
}
initialising = 1;
original_malloc = dlsym(RTLD_NEXT, "malloc");
original_free = dlsym(RTLD_NEXT, "free");
initialising = 0;
}
// If we're in a nested malloc call (the backtrace functions below can call malloc)
// then call the original malloc
if (level) {
return original_malloc(size);
}
++level;
if (fail_in == -1) {
memory = original_malloc(size);
} else {
// Find if we're in the stack
backtrace(stack, trace_size);
char **symbols = backtrace_symbols(stack, trace_size);
int found = 0;
for (int i = 0; i < trace_size; ++i) {
if (strstr(symbols[i], search_string) != NULL) {
found = 1;
break;
}
}
free(symbols);
if (!found) {
memory = original_malloc(size);
} else {
if (fail_in > 0) {
memory = original_malloc(size);
}
--fail_in;
}
}
--level;
return memory;
}
void free(void *ptr) {
if (ptr < (void*) initialising_buffer || ptr > (void*)(initialising_buffer + sizeof(initialising_buffer))) {
original_free(ptr);
}
}
Compiled with
gcc -shared -fPIC test_override_malloc.c -o test_override_malloc.so -ldl
Example Python code
This could go inside the unit tests
# Inside my_test.py
from ctypes import cdll
cdll.LoadLibrary('./test_override_malloc.so').set_fail_in(0, b'libpq.so')
# ... then call a function in the shared library libpq.so
# The `0` above means the very next call it makes to malloc will fail
Run with
LD_PRELOAD=$PWD/test_override_malloc.so python3 my_test.py
(This might all not be worth it admittedly... if Python calls malloc a lot, I wonder if that in most situations it's unlikely that Python will be fine but just the one call in the library will fail)