I've got a rather complicated program that does a lot of memory allocation, and today by surprise it started segfaulting in a weird way that gdb couldn't pin-point the location of. Suspecting memory corruption somewhere, I linked it against Electric Fence, but I am baffled as to what it is telling me:
ElectricFence Exiting: mprotect() failed:
Program received signal SIGSEGV, Segmentation fault.
__strlen_sse2 () at ../sysdeps/i386/i686/multiarch/strlen.S:99
99 ../sysdeps/i386/i686/multiarch/strlen.S: No such file or directory.
in ../sysdeps/i386/i686/multiarch/strlen.S
#0 __strlen_sse2 () at ../sysdeps/i386/i686/multiarch/strlen.S:99
#1 0xb7fd6f2d in ?? () from /usr/lib/libefence.so.0
#2 0xb7fd6fc2 in EF_Exit () from /usr/lib/libefence.so.0
#3 0xb7fd6b48 in ?? () from /usr/lib/libefence.so.0
#4 0xb7fd66c9 in memalign () from /usr/lib/libefence.so.0
#5 0xb7fd68ed in malloc () from /usr/lib/libefence.so.0
#6 <and above are frames in my program>
I'm calling malloc with a value of 36, so I'm pretty sure that shouldn't be a problem.
What I don't understand is how it is even possible that I could be trashing the heap in malloc. In reading the manual page a bit more, it appears that maybe I am writing to a free page, or maybe I'm underwriting a buffer. So, I have tried the following environment variables, together and by themselves:
EF_PROTECT_FREE=1
EF_PROTECT_BELOW=1
EF_ALIGNMENT=64
EF_ALIGNMENT=4096
The last two had absolutely no effect.
The first one changed the portions of the stack frame which are in my program (where in my program was executing when malloc was called fatally), but with identical frames once malloc was entered.
The second one changed a bit more; in addition to the crash occurring at a different place in my program, it also occurred in a call to realloc instead of malloc, although realloc is directly calling malloc and otherwise the back trace is identical to above.
I'm not explicitly linking against any other libraries besides fence.
Update: I found several places where it suggests that the message: " mprotect() failed: Cannot allocate memory" means that there is not enough memory on the machine. But I am not seeing the "Cannot allocate memory" part, and ps says I am only using 15% of memory. With such a small allocation (4k+32) could this really be the problem?
I just wasted several hours on the same problem. It turns out that it is to do with the setting in /proc/sys/vm/max_map_count
From the kernel documentation: "This file contains the maximum number of memory map areas a process may have. Memory map areas are used as a side-effect of calling malloc, directly by mmap and mprotect, and also when loading shared libraries.
While most applications need less than a thousand maps, certain programs, particularly malloc debuggers, may consume lots of them, e.g., up to one or two maps per allocation."
So you can 'cat' that file to see what it is set to, and then you can 'echo' a bigger number into it. Like this: echo 165535 > /proc/sys/vm/max_map_count
For me, this allowed electric fence to get past where it was before, and start to find real bugs.