I'm trying to debug a program that routinely crashes when started (it eventually starts after a few tries). After compiling with ASAN I get the following trace which shows that the 9 out of 10 crashes are triggered by std::vector<T>::push_back
(take note of line #9 and #15 in the two traces below):
==35520== ERROR: AddressSanitizer: heap-use-after-free on address 0x60520005c37f at pc 0x51cc5c bp 0x7f257ebfc050 sp 0x7f257ebfc048
READ of size 1 at 0x60520005c37f thread T8 (CELOXICA)
#0 0x51cc5b in MappingData** std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<MappingData*>(MappingData* const*, MappingData* const*, MappingData**) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_algobase.h:372
#1 0x51c9f5 in MappingData** std::__copy_move_a<false, MappingData**, MappingData**>(MappingData**, MappingData**, MappingData**) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_algobase.h:390
#2 0x51c502 in MappingData** std::__copy_move_a2<false, MappingData**, MappingData**>(MappingData**, MappingData**, MappingData**) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_algobase.h:428
#3 0x51b916 in MappingData** std::copy<MappingData**, MappingData**>(MappingData**, MappingData**, MappingData**) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_algobase.h:460
#4 0x519852 in MappingData** std::__uninitialized_copy<true>::__uninit_copy<MappingData**, MappingData**>(MappingData**, MappingData**, MappingData**) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_uninitialized.h:93
#5 0x5158d6 in MappingData** std::uninitialized_copy<MappingData**, MappingData**>(MappingData**, MappingData**, MappingData**) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_uninitialized.h:117
#6 0x511aeb in MappingData** std::__uninitialized_copy_a<MappingData**, MappingData**, MappingData*>(MappingData**, MappingData**, MappingData**, std::allocator<MappingData*>&) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_uninitialized.h:258
#7 0x50df13 in MappingData** std::__uninitialized_move_if_noexcept_a<MappingData**, MappingData**, std::allocator<MappingData*> >(MappingData**, MappingData**, MappingData**, std::allocator<MappingData*>&) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_uninitialized.h:281
#8 0x509202 in std::vector<MappingData*, std::allocator<MappingData*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<MappingData**, std::vector<MappingData*, std::allocator<MappingData*> > >, MappingData* const&) /home/olumide/4.8.5/include/c++/4.8.5/bits/vector.tcc:362
#9 0x5060ac in std::vector<MappingData*, std::allocator<MappingData*> >::push_back(MappingData* const&) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_vector.h:913
#10 0x4f1270 in Queue::publishMappingData(MappingData*) /home/olumide/repo/source/app/src/framework/queue.cpp:149
#11 0x7f258c449cd7 in Manager::communicationThread() (/home/fmeprod/apps/current/celoxica.so+0x2bcd7)
#12 0x7f25995f9e82 in thread_proxy (/home/repo/boost/boost_1_56_x64/lib/libboost_thread.so.1.56.0+0x10e82)
#13 0x7f2596248b87 in __asan::AsanThread::ThreadStart() /home/olumide/tmp/build/gcc-4.8.5/gcc-build-4.8.5/x86_64-unknown-linux-gnu/libsanitizer/asan/../../../../libsanitizer/asan/asan_thread.cc:99
#14 0x331ac07aa0 in start_thread (/lib64/libpthread.so.0+0x331ac07aa0)
#15 0x331a8e8bcc in clone (/lib64/libc.so.6+0x331a8e8bcc)
0x60520005c37f is located 2047 bytes inside of 2048-byte region [0x60520005bb80,0x60520005c380)
==35520== AddressSanitizer CHECK failed: ../../../../libsanitizer/asan/asan_allocator2.cc:216 "((id)) != (0)" (0x0, 0x0)
#0 0x7f25962423dd in __asan::AsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /home/olumide/tmp/build/gcc-4.8.5/gcc-build-4.8.5/x86_64-unknown-linux-gnu/libsanitizer/asan/../../../../libsanitizer/asan/asan_rtl.cc:60
#1 0x7f2596249123 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /home/olumide/tmp/build/gcc-4.8.5/gcc-build-4.8.5/x86_64-unknown-linux-gnu/libsanitizer/sanitizer_common/../../../../libsanitizer/sanitizer_common/sanitizer_common.cc:57
#2 0x7f25962356ab in __asan::GetStackTraceFromId(unsigned int, __sanitizer::StackTrace*) /home/olumide/tmp/build/gcc-4.8.5/gcc-build-4.8.5/x86_64-unknown-linux-gnu/libsanitizer/asan/../../../../libsanitizer/asan/asan_allocator2.cc:216
#3 0x7f2596246e7a in __asan::DescribeHeapAddress(unsigned long, unsigned long) /home/olumide/tmp/build/gcc-4.8.5/gcc-build-4.8.5/x86_64-unknown-linux-gnu/libsanitizer/asan/../../../../libsanitizer/asan/asan_report.cc:342
#4 0x7f2596247f61 in __asan_report_error /home/olumide/tmp/build/gcc-4.8.5/gcc-build-4.8.5/x86_64-unknown-linux-gnu/libsanitizer/asan/../../../../libsanitizer/asan/asan_report.cc:693
#5 0x7f2596242763 in __asan_report_load1 /home/olumide/tmp/build/gcc-4.8.5/gcc-build-4.8.5/x86_64-unknown-linux-gnu/libsanitizer/asan/../../../../libsanitizer/asan/asan_rtl.cc:226
#6 0x51cc5b in MappingData** std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<MappingData*>(MappingData* const*, MappingData* const*, MappingData**) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_algobase.h:372
#7 0x51c9f5 in MappingData** std::__copy_move_a<false, MappingData**, MappingData**>(MappingData**, MappingData**, MappingData**) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_algobase.h:390
#8 0x51c502 in MappingData** std::__copy_move_a2<false, MappingData**, MappingData**>(MappingData**, MappingData**, MappingData**) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_algobase.h:428
#9 0x51b916 in MappingData** std::copy<MappingData**, MappingData**>(MappingData**, MappingData**, MappingData**) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_algobase.h:460
#10 0x519852 in MappingData** std::__uninitialized_copy<true>::__uninit_copy<MappingData**, MappingData**>(MappingData**, MappingData**, MappingData**) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_uninitialized.h:93
#11 0x5158d6 in MappingData** std::uninitialized_copy<MappingData**, MappingData**>(MappingData**, MappingData**, MappingData**) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_uninitialized.h:117
#12 0x511aeb in MappingData** std::__uninitialized_copy_a<MappingData**, MappingData**, MappingData*>(MappingData**, MappingData**, MappingData**, std::allocator<MappingData*>&) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_uninitialized.h:258
#13 0x50df13 in MappingData** std::__uninitialized_move_if_noexcept_a<MappingData**, MappingData**, std::allocator<MappingData*> >(MappingData**, MappingData**, MappingData**, std::allocator<MappingData*>&) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_uninitialized.h:281
#14 0x509202 in std::vector<MappingData*, std::allocator<MappingData*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<MappingData**, std::vector<MappingData*, std::allocator<MappingData*> > >, MappingData* const&) /home/olumide/4.8.5/include/c++/4.8.5/bits/vector.tcc:362
#15 0x5060ac in std::vector<MappingData*, std::allocator<MappingData*> >::push_back(MappingData* const&) /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_vector.h:913
#16 0x4f1270 in Queue::publishMappingData(MappingData*) /home/olumide/repo/source/app/src/framework/queue.cpp:149
#17 0x7f258c449cd7 in Manager::communicationThread() (/home/fmeprod/apps/current/celoxica.so+0x2bcd7)
#18 0x7f25995f9e82 in thread_proxy (/home/repo/boost/boost_1_56_x64/lib/libboost_thread.so.1.56.0+0x10e82)
#19 0x7f2596248b87 in __asan::AsanThread::ThreadStart() /home/olumide/tmp/build/gcc-4.8.5/gcc-build-4.8.5/x86_64-unknown-linux-gnu/libsanitizer/asan/../../../../libsanitizer/asan/asan_thread.cc:99
#20 0x331ac07aa0 in start_thread (/lib64/libpthread.so.0+0x331ac07aa0)
#21 0x331a8e8bcc in clone (/lib64/libc.so.6+0x331a8e8bcc)
I cannot post the code because it is too large and the property of my employer nevertheless the essence of the relevant parts of the code is:
# Thread 1
void Manager::communicationThread()
{
MappingData* data = new MappingData(...)
...
m_queue.publishMappingData( data ); // m_queue is available to all threads
// data is not referenced or deallocated
}
void Queue::publishMappingData(MappingData*)
{
...
// m_buffer is a member of type std::vector<MappingData*> m_buffer;
m_buffer.push_back( data );
// contents of m_buffer are ONLY deallocated on shutdown
}
What is odd is that:
m_buffer
by thread 2 are not deleted until the application shuts down.The remaining 1 out 10 crashes occur while iterating over the contents of the same m_buffer
object as shown below
# Thread 3
void Transaction::completion()
{
...
m_queue.publishStatus(); // m_queue is available to all threads
...
}
void Queue::publishStatus()
{
...
for( int i = 0; i < m_buffer.size(); ++i )
{
.. new StatusCode( m_buffer[i]->m_id ); // crashes here
... // m_id is a member of MappingData
}
}
I know that there is basically 0% chance of there being a bug in the standard library but I don't know how to proceed. The only thing I can think of is to compare the differences in widths of pointer starting on line #10 of the trace. I thought this was due to an incompatibility between the libraries but I've used the Linux application file to check whether all the application and shared objects are all 64bit. (They are.) (The differences in widths of pointer starting on line #10 of the trace is due the stack trace omitting leading zeros in hexadecimal addresses.)
Fearing that AddressSanitizer might be crying wolf I decided to revert to compiling without asan and used gdb to debug. Also I built the application with gcc 4.4.7 and 4.8.5 (ancient I know but those are the compilers that we are constrained to use for now, and they've worked well -- until now). Both binaries produce similar traces as the asan-build
gcc 4.4.7
#0 _wordcopy_fwd_aligned (dstp=140736214036472, srcp=140735609323312, len=75535088) at wordcopy.c:101
#1 0x000000331a8839d2 in memmove (dest=0x7fffb40b5fc0, src=<value optimized out>, len=604284864) at memmove.c:73
#2 0x0000000000481d86 in __copy_m<MappingData*> (this=0x7116e0, __position=, __x=<value optimized out>) at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_algobase.h:378
#3 __copy_move_a<false, MappingData**, MappingData**> (this=0x7116e0, __position=, __x=<value optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_algobase.h:397
#4 __copy_move_a2<false, MappingData**, MappingData**> (this=0x7116e0, __position=, __x=<value optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_algobase.h:436
#5 copy<MappingData**, MappingData**> (this=0x7116e0, __position=, __x=<value optimized out>) at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_algobase.h:468
#6 uninitialized_copy<MappingData**, MappingData**> (this=0x7116e0, __position=, __x=<value optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_uninitialized.h:92
#7 uninitialized_copy<MappingData**, MappingData**> (this=0x7116e0, __position=, __x=<value optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_uninitialized.h:116
#8 __uninitialized_copy_a<MappingData**, MappingData**, MappingData*> (this=0x7116e0, __position=, __x=<value optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_uninitialized.h:256
#9 __uninitialized_move_a<MappingData**, MappingData**, std::allocator<MappingData*> > (this=0x7116e0, __position=, __x=<value optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_uninitialized.h:266
#10 std::vector<MappingData*, std::allocator<MappingData*> >::_M_insert_aux (this=0x7116e0, __position=, __x=<value optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/vector.tcc:338
#11 0x0000000000472e91 in push_back (this=0x711590, data=0x7fffb40b5d50) at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_vector.h:741
#12 Queue::publishMappingData (this=0x711590, data=0x7fffb40b5d50) at src/framework/queue.cpp:149
gcc 4.8.5
#0 0x000000331a889e1a in _wordcopy_bwd_aligned (dstp=140736349684976, srcp=140736348970864, len=84044416) at wordcopy.c:293
#1 0x000000331a8839ba in memmove (dest=0x7fff940df128, src=<value optimized out>, len=672355336) at memmove.c:99
#2 0x00000000004bd9b8 in MappingData** std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<MappingData*>(MappingData* const*, MappingData* const*, MappingData**) () at /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_algobase.h:372
#3 0x00000000004bd87e in MappingData** std::__copy_move_a<false, MappingData**, MappingData**>(MappingData**, MappingData**, MappingData**) () at /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_algobase.h:390
#4 0x00000000004bd5ac in MappingData** std::__copy_move_a2<false, MappingData**, MappingData**>(MappingData**, MappingData**, MappingData**) () at /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_algobase.h:428
#5 0x00000000004bcf91 in MappingData** std::copy<MappingData**, MappingData**>(MappingData**, MappingData**, MappingData**) ()
at /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_algobase.h:460
#6 0x00000000004bbeb7 in MappingData** std::__uninitialized_copy<true>::__uninit_copy<MappingData**, MappingData**>(MappingData**, MappingData**, MappingData**) () at /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_uninitialized.h:93
#7 0x00000000004ba28d in MappingData** std::uninitialized_copy<MappingData**, MappingData**>(MappingData**, MappingData**, MappingData**) () at /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_uninitialized.h:117
#8 0x00000000004b80dd in MappingData** std::__uninitialized_copy_a<MappingData**, MappingData**, MappingData*>(MappingData**, MappingData**, MappingData**, std::allocator<MappingData*>&) () at /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_uninitialized.h:258
#9 0x00000000004b5cfc in MappingData** std::__uninitialized_move_if_noexcept_a<MappingData**, MappingData**, std::allocator<MappingData*> >(MappingData**, MappingData**, MappingData**, std::allocator<MappingData*>&) () at /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_uninitialized.h:281
#10 0x00000000004b3239 in std::vector<MappingData*, std::allocator<MappingData*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<MappingData**, std::vector<MappingData*, std::allocator<MappingData*> > >, MappingData* const&) () at /home/olumide/4.8.5/include/c++/4.8.5/bits/vector.tcc:369
#11 0x00000000004b1398 in std::vector<MappingData*, std::allocator<MappingData*> >::push_back(MappingData* const&) () at /home/olumide/4.8.5/include/c++/4.8.5/bits/stl_vector.h:913
#12 0x00000000004a4e43 in Queue::publishMappingData(MappingData*) () at src/framework/queue.cpp:149
What stood out to me is the len variables passed to _wordcopy_bwd_aligned
(gcc 4.8.5) and _wordcopy_fwd_aligned
(gcc 4.4.7) is almost 100 million, while the len passed to memmove
in both cases is over 50 million! (Recall that the vector stores pointers.)
The length passed to the glibc functions is computed by std::__copy_move
and is the pointer difference of the __last
and __first
pointers ultimately passed to it by std::vector<_Tp,_Alloc>::_M_insert_aux
. A quick inspection of this member function template shows that the branch that led to the crash was taken because this->_M_impl._M_finish == this->_M_impl._M_end_of_storage
i.e. the vector is out of memory and had to reallocate and relocate its contents. But this is what the vectors are designed to do. I don't know if the above reasoning is correct bur this is how far I have come.
I do not see any thread synchronization primitives.
What might happen is that m_buffer.push_back
triggers realloc in one thread.
But the other thread, that reads the vector still accesses the old memory region before the vector was reallocated and copied.
In other words is not about MappingData*
pointers, it's about the memory region in vector
class that stores those pointers. This region was deallocated when vector reached its current capacity and then allocated again in thread A
. Thread B
starts accessing the data inside m_buffer[i]
or inside m_buffer.push_back()
and crashes because this memory region does not belong to the process anymore.