c++openmpstdmapclique-problem

OMP and parallel operations on a std::set<...>::iterator


Given a data structure based on the map shown below:

std::map<int, std::set<std::vector<int>>> cliques;

Where the key indicates the size of the vectors contained on it.

At the beginning, the map has only one key (e.g. [3]) which contains the input vectors (e.g. {1, 3, 5} and {2, 4, 6}).

My function takes the vectors stored in the biggest key of the map and decompose them into all the possible combinations featuring less elements and store them in the key corresponding to the size of the new vectors (e.g. [2] = {1,3} {1,5} {3,5} {2,4} {2,6} {4,6} and [1] = {1} {3} {5} {2} {4} {6}) .

I don't know if my solution is the most efficient, but it works pretty well. But since my project is intended to handle a big amount of data, I need to parallelize my code, which led me to the following implementation:

/// Declare data structure
std::map<int, std::set<std::vector<int>>> cliques;

/// insert "input" vectors
cliques[5] = {{1, 3, 5, 7, 9}};
cliques[4] = {{1, 2, 3, 4}};

/// set boundaries
int kMax = 5;
int kMin = 1;

/// enable/disable parallel execution
bool parallelExec = true;

/// "decompose" source vectors:
for (int k = kMax; k > kMin; k--)
{
  std::set<std::vector<int>>::iterator it = cliques[k].begin();
  #pragma omp parallel num_threads(max_threads) if(parallelExec)
  {
    for(int s = 0; s < cliques[k].size(); ++s)
    {
      std::vector<int> clique;
      /// maybe should be "omp critical"?
      /// maybe "clique" should be private? (or is it already??) 
      #pragma omp single
      { 
        clique = *it;
      }
      for (int v = 0; v < clique.size(); ++v)
      {
        int& vertex = clique[v];
        std::vector<int> new_clique;
        std::copy_if(clique.begin(), clique.end(), std::back_inserter(new_clique), [vertex](const int& elem) { return elem != vertex; });
        int kNew = k - 1;
        #pragma omp critical
        { 
          cliques[kNew].insert(new_clique);
        }
      }
      #pragma omp single
      {
        it++;
      }
    }
  }
}

/// Display results
for(int i = cliques.size(); i > 0 ; i--)
{
    auto kSet = cliques[i];
    std::cout << "[" << i << "]: ";  
    for(auto& vec : kSet)
    {
        std::cout << "{";
        for(auto& elem : vec)
        {
            std::cout << elem << " ";  
        }
        std::cout << "} ";
    }
    std::cout << std::endl;
}

The use of "omp parallel" and "omp single" (rather than "omp for") allows to access the data structure safely, while allowing all other operations to run in parallel. The code works almost perfectly, almost... since it miss some (very few) sub-vectors in the final results (which are successfully generated if omp is disabled).

Is there any "OMP expert" able to help me with this problem? Thank you in advance.

---------------

UPDATE

Minimal Complete Verifiable Example


Solution

  • I'm not sure I understand all the subtleties of your algorithm, therefore I cannot be completely sure of my analysis. That disclaimer said, here is what I believe happens:

    1. You are not parallelizing the processing: you don't distribute the work across threads, you only replicate the same work on all threads, which step on each other's toes to write the result back to the same location.
    2. Even that isn't done properly since the increment of the iterator, which is done with an omp single nowait allows for threads to work on the previous iteration since no synchronization on the value of it is performed at this stage. (NB: the omp single without nowait that protects the increment of the iterator upon exit has an implicit barrier that ensures the thread coherent view of this value, so the discrepancy can only be between the current iteration and the previous one)
    3. This cliques[kNew].insert(new_clique); is really the place where all can explode since the accesses to the same location are concurrent, which isn't supported by a standard container. (and which is wrong anyway, to the limit of my understanding)

    So again, please bear in mind my initial disclaimer, but I think that your algorithm is intrinsically very wrong, for many reasons, and that it is only by chance that it gives something anywhere close to what you expect.

    Finally, I was about to propose you an algorithm of mine but since there are so many pieces lacking from your code snippet, I just can't. If you post a proper mcve, then maybe I will.

    Update Based on your code, here is a possible parallel version:

    for (int k = kMax; k > kMin; k--)
    {
        std::set<std::vector<int>>::iterator it = cliques[k].begin();
        for(int s = 0; s < cliques[k].size(); ++s)
        {
            std::vector<int> clique = *it;
            #pragma omp parallel for num_threads(max_threads)
            for (int v = 0; v < clique.size(); ++v)
            {
                int& vertex = clique[v];
                std::vector<int> new_clique;
                std::copy_if(clique.begin(), clique.end(), std::back_inserter(new_clique), [vertex](const int& elem) { return elem != vertex; });
                int kNew = k - 1;
                #pragma omp critical
                cliques[kNew].insert(new_clique);
            }
            it++;
        }
    }