c++gccc++17fedora-25ranged-loops

ranged for loop of variables results in returning address-reference of local variable?


// g++ --std=c++17 test.cpp -I /usr/local/include -L /usr/local/lib -lboost_system -Wall -pedantic -Wreturn-type -Wstrict-aliasing -Wreturn-local-addr -fsanitize=address -g
// LD_LIBRARY_PATH=/usr/local/lib ./a.out

#include <iostream>
#include <boost/filesystem.hpp>

namespace fs = boost::filesystem;

class A {
public:
    fs::path path_;

    const fs::path & path() const { return path_; }
    fs::path & path() { return path_; }
};

class B {
public:
    fs::path root_path_;

    A path_2;
    A path_3;

    const fs::path & operator()() const {
        for ( const auto & path : {
            path_3.path(),
            path_2.path(),
            root_path_
        }) {
            if ( not path.empty() ) {
                return path;
            }
        }
        throw std::logic_error{"using default-constructed B"};
    }
};

int main(int argc, char **argv) {
    B b;
    b.root_path_ = "foo/";
    b.path_2.path() = "foo/bar";
    b.path_3.path() = "foo/baz";

    std::cout << b() << '\n';

    return 0;
}

With the above code which, as best as I am aware, appears to be valid C++. Instead, when invoked, I get garbage output.

g++ does not initially complain, however Address Sanitizer does. g++ finally complains when adding -O2. The warning generated is

test.cpp: In member function ‘const boost::filesystem::path& B::operator()() const’:
test.cpp:31:12: warning: function may return address of local variable [-Wreturn-local-addr]
     return path;
            ^~~~
test.cpp:29:3: note: declared here
   }) {
   ^

Note that I am using:

$ cat /etc/fedora-release 
Fedora release 25 (Twenty Five)
$ g++ --version
g++ (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Note that I solved the error by using a pointer instead.

    const fs::path & operator()() const {
            for ( const auto * path : {
                    &path_3.path(),
                    &path_2.path(),
                    &root_path_
            }) {
                    if ( not path->empty() ) {
                            return *path;
                    }
            }
            throw std::logic_error{"using default-constructed B"};
    }

But, that does leave some questions in my mind:

  1. why does g++ not complain about the problem until -O2 is added?
  2. exactly what about my code is undefined? I would say it's well-defined: B::operator() const is... well... const. That should mean that the objects used inside it are either locals or const members. It accesses const members. It constructs a local variable const auto & which... should reference a const member field. Exactly what causes it to bind to temporary instead?

Solution

    1. A compiler is not obligated to issue a diagnostic for undefined behavior. If the compiler can detect code that's syntactically valid, but results in undefined behavior, and then complains about it, that's just icing on top of the cake. gcc's -O2 turns on additional optimizations, and does additional code analysis; as such gcc can sometimes detect undefined behavior only if optimizations are enabled.

    2. It appears that your range iteration is over a temporary std::initializer_list. The range iteration variable is a reference into the initializer list. As such, the function ends up returning a reference to a temporary, which is what gcc is barking about, here. Since the temporary gets destroyed when the method returns, the method ends up returning a reference to a destroyed object. Any use of that reference comprises undefined behavior.

    When you convert the temporary range to a list of pointers, you are iterating by value, and you are not returning a reference to a temporary, but rather derefencing the value in the range, which is a perfectly kosher pointer.

    Pay attention to the following blurb from here:

    The underlying array is not guaranteed to exist after the lifetime of the original initializer list object has ended. The storage for std::initializer_list is unspecified (i.e. it could be automatic, temporary, or static read-only memory, depending on the situation).

    The lifetime if the initializer list bound to the range iteration ends at the conclusion of the iteration. Which includes returning from the method. As such, the initializer list no longer exists, and you just returned a dangling reference.