c++multithreadingasynchronousstdasync

Understanding Scope and Lifetime of References in std::async within a Loop


My question centers around the for-loop in the listDirs function, where I am launching async tasks. I am passing path by reference to std::async which then invokes the listDir function in a separate thread.

I am aware that once the for-loop moves to the next iteration, the path variable, which is a const reference to a std::filesystem::path instance in the paths vector, goes out of scope. However, listDir function's parameter is a reference which should be bound to path.

My understanding is that even though path goes out of scope in the listDirs function, the actual std::filesystem::path instances in the paths vector persist for the entire duration of the listDirs function, as we're passing by std::ref. But I'm not certain if this understanding is correct.

Can someone please clarify how this works? Specifically:

Does std::ref in std::async ensure that listDir gets a valid reference even when path goes out of scope in the listDirs function? Is there any risk of a dangling reference in this scenario?

#include <filesystem>
using Iterator = std::filesystem::directory_iterator;
// The caller of this function is the thread runtime
std::vector<std::string> listDir(const std::filesystem::path& directory)
{
    
    
    std::vector<std::string> files;
    for (Iterator it(directory); it != Iterator(); ++it)
    {
        
        if (it->is_regular_file())
        {
            files.emplace_back(it->path().filename().string());
            
        }
        
    }
    // When we return this vector as the final action in the function, Return Value Optimization(RVO) takes place to
    // eliminate any extra copying of the vector
    return files;

}

std::vector<std::string> listDirs(const std::vector<std::filesystem::path>& paths)
{
    using Iterator = std::filesystem::directory_iterator;
    std::vector<std::future<std::vector<std::string>>> futures; // listDir returns std::vector<std::string> type
    // iterate over all the directory paths
    for (const std::filesystem::path& path : paths)
    {
    // start each thread using std::async
        futures.emplace_back(std::async(listDir, std::ref(path)));
    }
    std::vector<std::string> allFiles;
    for (std::future<std::vector<std::string>>& fut : futures)
    {

        std::vector<std::string> files = fut.get(); // RVO
        std::move(files.begin(), files.end(), std::back_inserter(allFiles));

    }
    // When we return this vector as the final action in the function, Return Value Optimization(RVO) takes place to
    // eliminate any extra copying of the vector
    return allFiles;
}
int main()
{
    std::filesystem::path currentPath("G:\\lesson4");
    std::vector<std::filesystem::path> paths;

    for (Iterator it(currentPath); it!= Iterator(); ++it)
    {
        if (it->is_directory())
        {
            std::cout << it->path() << '\n';
            paths.emplace_back(it->path());
        }
        
    }

    for (const auto& fileName : listDirs(paths))
    {
        std::cout << fileName << std::endl;
    }

}

Solution

  • In your loop, the variable path is a reference. You can think of it a little like a pointer, except it's not.

    for (const std::filesystem::path& path : paths)
    {
        // start each thread using std::async
        futures.emplace_back(std::async(listDir, std::ref(path)));
    }
    

    At the first iteration of your loop, path refers to the first element of the vector paths. At the second iteration, it refers to the second element of the vector. And so on...

    Because paths does not change for the lifetime of any reference into its elements (even those used in futures), this is safe. When you pass path into the std::async constructor with std::ref(path), that reference wrapper will encapsulate the current reference.

    In fact, reference wrappers are typically implemented using a pointer under the hood, because that's the only practical way to pass around a reference as an lvalue.

    Even if the loop moves to the second iteration before your first async method is called, the reference binding remains intact and still refers to the first element of paths.