c++parallel-processingopenmp

How to print from a parallel loop


How can I print from a loop with OpenMP parallel execution? I am hoping to avoid critical code or similar which (I hear) can really slow down the execution.

Additional complication: it seem that on the cluster computer which I ran my code on, I can only print from the master thread (id=0). So I tried the following code. It mostly works, but when a worker thread writes to a stringstream concurrently with the main thread reading it, it can lead to an abnormal behavior.

ADDED: 1. The order of output doesnt matter. 2. Since my job will be terminated after 48h, while possibly still unfinished, I can't wait with printing till the end of the loop computation.

#include <iostream>
#include <vector>
#include <omp.h>
#include <sstream>
using namespace std;

int main() {
  int nthreads=10;
  omp_set_num_threads(nthreads);
  vector<stringstream> ss(nthreads);
  #pragma omp parallel for schedule(static,1)
  for(unsigned int i= 1; i<=100000;i++){
    int id=omp_get_thread_num();
    if(id==0){
      for(int idi=0;idi<nthreads;idi++)
        if(ss[idi].tellp()>ss[idi].tellg()) cout<<ss[idi].rdbuf();
    }
    // DO WORK,
    if(some condition) ss[id]<< some outcome...
    // MORE WORK
    ss[id]<< more outcome
  }
  return 0;
}

Solution

  • The standard simple solution requires C++20. First observation is std::osyncstream from <syncstream> header:

    osyncstream(std::cout) << my_line << "\n;
    

    But if you can use C++20, then <format> will be available too:

    std::puts(std::format("{:s}\n", my_line).c_str());
    

    C IO functions like std::puts acquire an internal lock on every invocation.

    If C++23 is available too, you can use <print>:

    std::println(stdout, "{:s}", my_line);
    

    The latest standard requires that std::print and std::println too, acquire a lock on the output stream before printing. But since it's relatively new, that guarantee is subject to quality of implementation.

    Last resort for optimization would be to define a spin lock using std::atomic and lock+unlock on every call. It would be faster than a mutex, but I don't illustrate it now.