c++multithreadinggoogletesttbbthread-sanitizer

tbb's private_server and false positive ThreadSanitizer data races


We are getting false positive ThreadSanitizer (tsan) data race warnings on a frequent but inconsistent basis. Though it is well-known that tsan can give false positive warnings, some of which may be suppressed via the TSAN_OPTIONS environment variable, there is a particular class of warnings that we are encountering that appear specifically related to Intel's Thread Building Block's (tbb) use of tbb::detail::r1::rml::private_server that appears preventable if we could somehow have more control over the stopping of this private_server for instance. Here is one such false positive tsan data race warning encountered during a Google Test run:

WARNING: ThreadSanitizer: data race (pid=5244)
  Write of size 1 at 0x7ffda4d64fd8 by main thread:
    #0 std::shared_lock<std::shared_mutex>::shared_lock(std::shared_mutex&, std::defer_lock_t) /usr/local/foo-deps/20220316/include/c++/9.4.0/shared_mutex:639 (FooTest+0x68d162)
    #1 FooProxy::buildTranslationMapToOtherProxy(FooProxy*, std::vector<foo::StringOpInfo, std::allocator<foo::StringOpInfo> > const&) const /home/jenkins-slave/workspace/core-tsan-gcc/Foo/FooProxy.cpp:323 (FooTest+0x68d162)
    #2 FooProxy_BuildTranslationMapToPartialOverlapProxy_Test::TestBody() /home/jenkins-slave/workspace/core-tsan-gcc/Tests/FooTest.cpp:798 (FooTest+0x5c5284)
    #3 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:3968 (FooTest+0x62d798)
    #4 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:4004 (FooTest+0x62d798)
    #5 testing::Test::Run() /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:4043 (FooTest+0x618586)
    #6 testing::TestInfo::Run() /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:4219 (FooTest+0x6187d4)
    #7 testing::TestSuite::Run() /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:4351 (FooTest+0x618959)
    #8 testing::internal::UnitTestImpl::RunAllTests() /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:6892 (FooTest+0x618e7e)
    #9 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:3968 (FooTest+0x62de38)
    #10 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:4004 (FooTest+0x62de38)
    #11 testing::UnitTest::Run() /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:6479 (FooTest+0x619440)
    #12 RUN_ALL_TESTS() /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gtest/gtest.h:11696 (FooTest+0x5b401a)
    #13 main /home/jenkins-slave/workspace/core-tsan-gcc/Tests/FooTest.cpp:974 (FooTest+0x5b401a)

  Previous read of size 8 at 0x7ffda4d64fd8 by thread T18:
    [failed to restore the stack]

  Location is stack of main thread.

  Location is global '<null>' at 0x000000000000 ([stack]+0x00000001efd8)

  Thread T18 (tid=5264, running) created by main thread at:
    #0 pthread_create ../../.././libsanitizer/tsan/tsan_interceptors.cc:964 (libtsan.so.0+0x2cd6b)
    #1 tbb::detail::r1::rml::private_server::wake_some(int) <null> (FooTest+0x8828ce)
    #2 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) <null> (FooTest+0x88b1c2)
    #3 tbb::detail::r1::task_arena_impl::execute(tbb::detail::d1::task_arena_base&, tbb::detail::d1::delegate_base&) <null> (FooTest+0x86e74c)
    #4 Foo::getStringViews() const /home/jenkins-slave/workspace/core-tsan-gcc/Foo/Foo.cpp:1869 (FooTest+0x63612c)
    #5 Foo_GetStringViews_Test::TestBody() /home/jenkins-slave/workspace/core-tsan-gcc/Tests/FooTest.cpp:141 (FooTest+0x5c625c)
    #6 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:3968 (FooTest+0x62d798)
    #7 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:4004 (FooTest+0x62d798)
    #8 testing::Test::Run() /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:4043 (FooTest+0x618586)
    #9 testing::TestInfo::Run() /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:4219 (FooTest+0x6187d4)
    #10 testing::TestSuite::Run() /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:4351 (FooTest+0x618959)
    #11 testing::internal::UnitTestImpl::RunAllTests() /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:6892 (FooTest+0x618e7e)
    #12 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:3968 (FooTest+0x62de38)
    #13 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:4004 (FooTest+0x62de38)
    #14 testing::UnitTest::Run() /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gmock-gtest-all.cc:6479 (FooTest+0x619440)
    #15 RUN_ALL_TESTS() /home/jenkins-slave/workspace/core-tsan-gcc/ThirdParty/googletest/gtest/gtest.h:11696 (FooTest+0x5b401a)
    #16 main /home/jenkins-slave/workspace/core-tsan-gcc/Tests/FooTest.cpp:974 (FooTest+0x5b401a)

SUMMARY: ThreadSanitizer: data race /usr/local/foo-deps/20220316/include/c++/9.4.0/shared_mutex:639 in std::shared_lock<std::shared_mutex>::shared_lock(std::shared_mutex&, std::defer_lock_t)

(Some names have been altered for anonymity.) Summary of events in chronological order:

  1. Google test Foo.GetStringViews is run (Thread T18 frame #5)
    • During this test, an instance ta of tbb::task_arena calls ta.execute([&] { tbb::parallel_for(...); });.
    • This appears to run tbb::detail::r1::rml::private_server::wake_some(int) which spawns a thread that survives in between Google tests.
  2. Google test FooProxy.BuildTranslationMapToPartialOverlapProxy is run (main thread frame #2)
    • This test writes to address 0x7ffda4d64fd8 that was read by the previous test.

Our TSAN_OPTIONS environment variable is set to

suppressions=/path/to/tsan.suppressions, history_size=7, second_deadlock_stack=1, halt_on_error=1

We surmise that the false positive data race warning is due to 3 primary ingredients:

It is because the tbb::detail::r1::rml::private_server from the first test remains concurrent with the second test that confuses tsan to flag this as a data race.

Question(s)

How can the tbb::detail::r1::rml::private_server thread be killed at the beginning or end of each test?

Alternatively, if that's not possible, is there something that we can add to our tsan.suppressions file or TSAN_OPTIONS environment variable that specifically suppresses this false warning without hiding real data races that may occur?


Solution

  • To kill the tbb::detail::r1::rml::private_server after each Google Test, we overrode the Test Fixture TearDown() method:

    void TearDown() override {
      // Expected to kill tbb::detail::r1::rml::private_server after each test,
      // which can otherwise trigger false positive tsan data race warnings.
      auto handle = tbb::task_scheduler_handle::get();
      tbb::finalize(handle, std::nothrow_t{});
    }
    

    In our version of TBB we also had to #define TBB_PREVIEW_WAITING_FOR_WORKERS and #include <tbb/global_control.h>.

    Credit: Pavel Kumbrasev for the suggestion.