I am looking for SIGBUS on unaligned data access. I am tracking one of this errors and I would like to know how this is happening on sitara am335x. Can someone please give me an example code to describe this or ensure triggering it.
Adding code snippet:
int Read( void *value, uint32_t *size, const uint32_t baseAddress )
{
uint8_t *userDataAddress = (uint8_t *)( baseAddress + sizeof( DBANode ));
memcpy( value, userDataAddress, ourDataSize );
*size = ourDataSize;
return 0;
}
DBA node is a class object of 20 bytes. baseAddress is an mmap to a shared memory file again of a class object type of DBANode casted to a uint32_t so that the arithmetic can be done.
This is the dissasembly of the section:
91a8: e51b3010 ldr r3, [fp, #-16]
91ac: e5933000 ldr r3, [r3]
91b0: e51b0014 ldr r0, [fp, #-20] ; 0xffffffec
91b4: e51b1008 ldr r1, [fp, #-8]
91b8: e1a02003 mov r2, r3
91bc: ebffe72b bl 2e70 <memcpy@plt>
91c0: e51b3010 ldr r3, [fp, #-16]
91c4: e5932000 ldr r2, [r3]
91c8: e51b3018 ldr r3, [fp, #-24] ; 0xffffffe8
91cc: e5832000 str r2, [r3]
00002e70 <memcpy@plt>:
2e70: e28fc600 add ip, pc, #0, 12
2e74: e28cca08 add ip, ip, #8, 20 ; 0x8000
2e78: e5bcf868 ldr pc, [ip, #2152]! ; 0x868
When the exact same code base was re-built, the problem just disappeared. Can the gcc create 2 different versions of instructions with same optimization of -O0 specified for gcc ?
I also diffed the library so files obj dumps in both compilations. They are exactly the same. The api is used quite often. However, the crash only happens after prolonged use over a few days. I am reading the same node every 500ms. So this is not consistent. Should I be looking at pointer corruption ?
Turns out the baseAddress is the issue. As I mentioned its an mmap to an shared memory location where the mmap can fail. failed mmap returns -1 and the code was checking for NULL and proceeding to write to -1 i.e 0xFFFFFFFF causing a sigbus. The code 1 is seen when we use memcpy. Trying any other access like a direct byte addressing gives a code 3 with sigbus.
I am still not sure why it triggers SIGBUS instead of SIGSEGV. Shouldn't this be a memory violation instead ? Here is an example:
int main(int argc, char **argv)
{
// Shared memory example
const char *NAME = "SharedMemory";
const int SIZE = 10 * sizeof(uint8_t);
uint8_t src[]={0x11,0x22,0x33,0x44,0x55,0x66,0x77,0x88,0x99,0x00};
int shm_fd = -1;
shm_fd = shm_open(NAME, O_CREAT | O_RDONLY, 0666);
ftruncate(shm_fd, SIZE);
// Map shared memory segment to address space
uint8_t *ptr = (uint8_t *) mmap(0, SIZE, PROT_READ | PROT_WRITE | _NOCACHE, MAP_SHARED, shm_fd, 0);
if(ptr == MAP_FAILED)
{
std::cerr << "ERROR in mmap()" << std::endl;
// return -1;
}
printf("ptr = 0x%08x\n",ptr);
std::cout << "Now storing data to mmap() memory" << std::endl;
#if 0
ptr[0] = 0x11;
ptr[1] = 0x22;
ptr[2] = 0x33;
ptr[3] = 0x44;
ptr[4] = 0x55;
ptr[5] = 0x66;
ptr[6] = 0x77;
ptr[7] = 0x88;
ptr[8] = 0x99;
ptr[9] = 0x00;
#endif
memcpy(ptr,src,SIZE); //causes sigbus code 1
shm_unlink(NAME);
}
I still do not know why mmap is failing on an shm even though I have a 100MB of RAM available and all my resource limits are set to unlimited with over 400 fds (file descriptors) still available out of 1000 fds limit. !!!