parallel-processingopenmpcilkcilk-plus

Cilk or Cilk++ or OpenMP


I'm creating a multi-threaded application in Linux. here is the scenario:

Suppose I am having x instance of a class BloomFilter and I have some y GB of data(greater than memory available). I need to test membership for this y GB of data in each of the bloom filter instance. It is pretty much clear that parallel programming will help to speed up the task moreover since I am only reading the data so it can be shared across all processes or threads.

Now I am confused about which one to use Cilk, Cilk++ or OpenMP(which one is better). Also I am confused about which one to go for Multithreading or Multiprocessing


Solution

  • Cilk Plus is the current implementation of Cilk by Intel. They both are multithreaded environment, i.e., multiple threads are spawned during execution.

    If you are new to parallel programming probably OpenMP is better for you since it allows an easier parallelization of already developed sequential code. Do you already have a sequential version of your code?

    OpenMP uses pragma to instruct the compiler which portions of the code has to run in parallel. If I understand your problem correctly you probably need something like this:

       #pragma omp parallel for firstprivate(array_of_bloom_filters)
       for i in DATA:
          check(i,array_of_bloom_filters);
    

    the instances of different bloom filters are replicated in every thread in order to avoid contention while data is shared among thread.

    update: The paper actually consider an application which is very unbalanced, i.e., different taks (allocated on different thread) may incur in very different workload. Citing the paper that you mentioned "a highly unbalanced task graph that challenges scheduling, load balancing, termination detection, and task coarsening strategies". Consider that in order to balance computation among threads it is necessary to reduce the task size and therefore increase the time spent in synchronizations. In other words, good load balancing comes always at a cost. The description of your problem is not very detailed but it seems to me that the problem you have is quite balanced. If this is not the case then go for Cilk, its work stealing approach its probably the best solution for unbalanced workloads.