parallel-processingopenmprace-conditionfalse-sharing

OpenMP Do I have race condition or false-sharing '?


I'm trying to write a code for matrix multiplication. As far as I understand OMP and pararel programming this code may suffer from race condition.

#pragma omp parallel
#pragma omp for
    for (int k = 0; k < size; k++){
        for (int i = 0; i < size; i++) {
            for (int j = 0; j < size; j++) {
                c[i][j] += a[i][k] * b[k][j];
}}}

Do I get rid of it if I put #pragma omp atomic before writing to c matrix or by adding private(i) to 2nd #pragma? Also is it possible to make this code false-sharing free? If yes, how ?


Solution

  • A race condition occurs when 2 or more threads access the same memory location and at least one of them is writing it. Line c[i][j] +=... can cause data race in your code. The solution is to reorder your nested loops (use the order of i,j,k) and you may introduce a temporary variable to calculate the dot product:

    #pragma omp parallel for
        for (int i = 0; i < size; i++) {
            for (int j = 0; j < size; j++) {
                double tmp=0; // change its type as needed
                for (int k = 0; k < size; k++){
                    tmp += a[i][k] * b[k][j];
                }
                c[i][j] = tmp;  //note that += was used in your original code
            }
        }
    

    Note that your code will be faster if you calculate the transpose of matrix b. For more details read this.

    UPDATE:

    If you need to maintain the order of loops, there are 2 possibilities (but these solutions may be slower than the serial code):

    1. Use atomic operation (i.e #pragma omp atomic). In this case false sharing also can be a problem.

    2. If your stack is large enough to store the matrix for all threads, a better alternative is to use reduction: #pragma omp parallel for reduction(+:c[:size][:size]) (Another alternative is to do the reduction manually. In this case you can allocate the matrices used for reduction on the heap.)