I have a program that copies buffers to files, mmap's them back and then checks their contents. Multiple threads can work on the same file. Occasionally, I am getting SIGBUS when reading, but only under load.
The mappings are MAP_PRIVATE and MAP_POPULATE. The crash via SIGBUS occurs after mmap was successful which I do not understand since MAP_POPULATE was used.
Here is a full example (creates files under /tmp/buf_* filled with zeroes), using OpenMP to create more load and concurrent writes:
// Program to check for unexpected SIGBUS
// gcc -std=c99 -fopenmp -g -O3 -o mmap_manymany mmap_manymany.c
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#define NBUFS 64
const char bufs[NBUFS][65536] = {{0}};
const char zeros[65536] = {0};
int main()
{
int count = 0;
while ( 1 )
{
void *mappings[ 1000 ] = {NULL};
#pragma omp parallel for
for ( int i = 0; i < 1000; ++i )
{
// Prepare filename
int bufIdx = i % NBUFS;
char path[ 128 ] = { 0 };
sprintf( path, "/tmp/buf_%0d", bufIdx );
// Write full buffer
int outFd = -1;
#pragma omp critical
{
remove( path );
outFd = open( path, O_EXCL | O_CREAT | O_WRONLY | O_TRUNC, 0644 );
}
assert( outFd != -1 );
ssize_t size = write( outFd, bufs[bufIdx], 65536 );
assert( size == 65536 );
close( outFd );
// Map it to memory
int inFd = open( path, O_RDONLY );
if ( inFd == -1 )
continue; // Deleted by other thread. Nevermind
mappings[i] = mmap( NULL, 65536, PROT_READ, MAP_PRIVATE | MAP_POPULATE, inFd, 0 );
assert( mappings[i] != MAP_FAILED );
close( inFd );
// Read data immediately. Creates occasional SIGBUS but only under load.
int v = memcmp( mappings[i], zeros, 65536 );
assert( v == 0 );
}
// Clean up
for ( int i = 0; i < 1000; ++i )
munmap( mappings[ i ], 65536 );
printf( "count: %d\n", ++count );
}
}
No assert fires for me, but the program always crashes after a few seconds with SIGBUS.
With your current program, it can happen that thread 0 creates /tmp/buf_0
, writes to it and closes it. Then thread 1 removes and creates /tmp/buf_0
, but before thread 1 writes to it, thread 0 opens, maps, and reads from /tmp/buf_0
- and thus tries to access a file does not yet contain 64 kiB data. You get a SIGBUS
.
To avoid that issue, just make unique files / and bufs
for each thread, by using omp_get_thread_num()
instead of bufIdx
.