c++c++11gccvalgrind

valgrind reporting invalid read with std::string


I'm working on code that runs on a raspberry pi 3. And got the following error on my logging classes.

==1297== Invalid read of size 8
==1297==    at 0x4865D1C: ??? (in /usr/lib/arm-linux-gnueabihf/libarmmem.so)
==1297==  Address 0x4c8d45c is 100 bytes inside a block of size 107 alloc'd
==1297==    at 0x4847DA4: operator new(unsigned int) (vg_replace_malloc.c:328)
==1297==    by 0x49C3D9B: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::reserve(unsigned int) (in /usr/lib/arm-linux-gnueabihf/libstdc++.so.6.0.22)
==1297==    by 0x4AE65: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (basic_string.tcc:1155)
==1297==    by 0xF82B5: Log::Book::addField(std::unique_ptr<Log::Entry, std::default_delete<Log::Entry> >&, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (LogBook.cpp:149)
==1297==    by 0xF7CCB: Log::Book::record(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::chrono::time_point<std::chrono::_V2::system_clock, std::chrono::duration<long long, std::ratio<1ll, 1000000000ll> > >) (LogBook.cpp:87)

GCC version: gcc version 6.3.0 20170516 (Raspbian 6.3.0-18+rpi1+deb9u1)
valgrind version: valgrind-3.13.0
I can't seem to locate the problem since the function Log::Book::record() get it's value via pass-by-value. I can also say that this error isn't always shown when the function is called. It is deterministic in the sense of on what line the error shows and on what line not. Can anybody direct my in direction of what this problem is and the solution for it? Code snippet below with a comment of the indicated lines.

/** log message */
void Book::record(std::string file, const int line, const unsigned int level, Identifier id, const std::string message,
                  const std::chrono::high_resolution_clock::time_point timeStamp)
{
    if (!(fileLevels & level) && !(consoleLevels & level)) { return; }

    auto now = Time::keeper->now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(timeStamp - Time::globalEpoch);

    //generate message
    auto entry = std::make_unique<Entry>(level);

    // Time since startup
    addField(entry, 0, std::to_string(duration.count()));

    //UTC Time
    addField(entry, 1, now.dateTime());

    // File
    std::string stringFile;
    if (!file.empty())
    {
        stringFile = URL{file}.lastPathComponent();
    }
    addField(entry, 2, stringFile);

    //Line number
    addField(entry, 3, std::to_string(line));

    //ID
    addField(entry, 4, id);

    //Message
    std::string stringMessage;
    if(!message.empty())
    {
        addField(entry, 5, message); //this is line LogBook.cpp:87
    }
    else
    {
        addField(entry, 5, " empty message.");
    }
    *entry << ";";

    //queue message
    this->append(std::move(entry));
}
void Book::addField(std::unique_ptr<Entry> &entry, unsigned int index, const std::string &text)
{
    std::string textOutput;

    if ((spacings.at(index) != 0) && (text.length() > (spacings.at(index) - 1)))
    {
        spacings.at(index) = (uint8_t) (text.length() + 2);
    }

    entry->setWidth(spacings.at(index));

    if(entry->empty())
        textOutput = text;
    else
        textOutput = ";" + text;   //This is line LogBook.cpp:149

    if(!textOutput.empty())
        (*entry) << textOutput;
}

The code where this function gets called and this problem occurs.

auto node = child(items, "item", index);
auto enabled = boolValue(node, "enabled", false);
auto file = pathValue(node, key::path);
auto name = stringValue(node, "name", "");
auto type = stringValue(node, "type");

CLOG(CLOG::WARNING, "Yard item " + name + " not enabled, path:" + file.path());

Update 1: I compile with cmake with the options. And added extra options. These didn't solve the problem.

add_compile_options(-ggdb)
add_compile_options(-O1)

#Extra disable vectorization
add_compile_options(-fno-tree-vectorize)
add_compile_options(-fno-tree-loop-vectorize)
add_compile_options(-fno-tree-slp-vectorize)

Update 2:
I've found an other place where string concatenation is used and valgrind reports te same errors


Update 3:
Some time and interesting discoveries later. Error happens in a shared library libarmmem.so. This gets dynamically loaded and is by this reason always on a different address. Used gdb and valgrind combination to break when the error happens.
gdb loaded shared libraries with starting address.

(gdb) info sharedlibrary
From        To          Syms Read   Shared Object Library
0x0483246c  0x04832750  Yes         /usr/local/lib/valgrind/vgpreload_core-arm-linux.so
0x04846e60  0x04850c10  Yes         /usr/local/lib/valgrind/vgpreload_memcheck-arm-linux.so
0x04863588  0x048672fc  Yes (*)     /usr/lib/arm-linux-gnueabihf/libarmmem.so
...

Error reported by valgrind.

==9442== Invalid read of size 8
==9442==    at 0x4865D34: ??? (in /usr/lib/arm-linux-gnueabi/libarmmem.so)

We know from readelf of libarmmem.so that the .text section begins on 588. and that memcpy sits on 710. The disassembly on this breakpoint shows we are in memcpy on address 0x04863710. If we check the range like : 0x04863588 - 0x04863710 = 188. 188 + 588(starting adress of .text) = 710.
Disassembly shows it happens on a assembly line. vldmia is a instruction for Load Vector Floating Point registers.

0x04865d34 <+9764>: vldmia  r1!, {d9}

No solution yet.


Solution

  • Most probly the code inside libarmem.so has been vectorized in such a way that it realizes that there's a terminating char only after reading full 8-byte chunk. This will not trigger a processor exception (as algorithm ensures that pointer is aligned and thus stays in the same page) but will cause tools like Valgrind to report false positives.

    Problems like this are getting worse over time and making Valgrind less useful in practice. See Valgrind vs Optimising Compilers for an in-depth discussion or this bug in diff for real-world example (or my Debian suppression list for even more examples).