c++c++11parallel-processingcilk-plus

cilk_for error for parallelizing over std::set missing operator-()


I was trying to use cilk_for for iterating over set. turns out that it does not have operator-(..) defined for set. the Cilk_for tutorial explains the reason but did not provide any example to handle such case. they say the following should be provided : but i am not sure where and how to put values:

link is here

difference_type operator-(termination_type, variable_type); // where to use this?

//my example
auto func = std::bind (my_functor, std::placeholders::_1);  

std::set<vertex_id> vertex_set;
// fill the vertex_set with some vertex ids

cilk_for (auto it = vertex_set.begin(); it!=vertex_set.end(); ++it) {
   func(*it) 
}

How and where do I provide the operator-(..) for cilk compiler to process this?

the variable_type is the set::iterator . the difference type is difference_type (ptrdiff_t) but what is the termination_type as per their example?


Solution

  • The problem is not with the termination expression, it != vertex_set.end(), which is fine. In this case the termination_type is simply std::set::iterator.

    The problem is that std::set does not have random-access iterators, and thus does not have an operator- or an operator+=. In other words, you cannot advance the iterator in constant time and you cannot compute the distance between two iterators in constant time. There is no reasonable way add these missing operators. A cilk_for loop is simply not intended to traverse linear or tree-structured containers like list or set.

    To understand why this is so, consider what is needed to execute a loop in parallel. First, you must carve the data set up into roughly-equal chunks, then you must dispatch those chunks to the parallel workers. Each iteration of the loop must be able to immediately go to the chunk of input it is expected to work on. If it needs to linearly traverse the entire container in order to find the start of its chunk, then you have largely defeated the purpose of using a parallel loop.

    There are several ways to improve upon the situation. The first is to consider whether your set is large enough, and the set of operations you are doing on it expensive enough, to bother with parallelism at all? If not, use a serial loop and forget about it. If so, consider using a different data structure. Can you replace the set with a vector-based data structure. Search around the web and you'll find things like AssocVector in Loki, which has the same interface as std::set but uses vector in its implementation and has random-access iterators; it should be easily to modify or wrap to give in an std::set interface instead of std::map. It will often out-perform std::set or std::map.

    If my_functor does a significant amount of work, you might be able to amortize the cost of the linear traversal with a simple serial loop containing a spawn:

    for (auto it = vertex_set.begin(); it!=vertex_set.end(); ++it) {
        cilk_spawn func(*it);
    }
    cilk_sync;
    

    Beware, however, that if func() is relatively small, then the overhead of the repeated spawns will have a noticeable negative impact on performance.

    Alternatively, you can use a cilk_for, iterate over indexes instead of using iterators, as in the following code:

    cilk_for (std::size_t i = 0; i < vertex_set.size(); ++i) {
        auto it = vertex_set.begin();
        std::advance(it, i);  // Slow step!
        func(*it);
    }
    

    If your set is big enough, you can reduce the number of calls to std::advance by pre-chunking the set into a vector of iterators. Then you can iterate over the chunks in parallel:

    // Each element of this vector is the start of a chunk of the set.
    std::vector<std::set<vertex_id>::iterator> chunks;
    // Create 1000 chunks
    const auto chunk_size = vertex_set.size() / 1000;
    auto chunk = vector_set.begin();
    for (int i = 0; i < 1000; ++i) {
        chunks.push(chunk);
        advance(chunk, chunk_size);
    }
    chunks.push(vector_set.end());
    
    // Now, process the chunks in parallel
    cilk_for (int i = 0; i < 1000; ++i) {
        // Iterate over elements in the chunk.
        for (auto it = chunks[i]; it != chunks[i + 1]; ++it)
            func(*it);
    }
    

    There are a lot of variations on the above two themes, depending on your situation. Make sure to do some profiling and timing to help you choose the best approach.

    I'll close by mentioning that a variation on cilk_for is being considered that would operate on any data structure that can be subdivided recursively (as a tree can). Such a variation would solve your problem directly, but I can make no promise on when such a thing might be available.