cmultithreadingpthreadspthread-join

pthreads parallelization incorrect result


I'm writing a C-program using pthreads. The goal is to compute the multiples of given numbers by passing them as arguments. The numbers to multiply and the amount of multiples are free to choose.

The program is compiled with gcc -lpthread -Wall -Wextra in.c, the executable is called with ./a.out num amount num amount ...

The program allocates memory for each "input pair" and creates a thread for each computation, then all threads are joined and the memory regions to which the threads wrote are being printed to the screen.

The problem is that the program often leaves at least one of the outputs empty (0x00). By repeating the same input, the correct result only rarely appears. For example with the input ./a.out 10 3 7 5 3 4 the outputs (here compressed) look like :

Thread 0 result: 10 20 30  or Thread 0 result: 0 0 0    but rarely the Thread 0 result: 10 10 0 
Thread 1 result: 0 0 0 0 0 or Thread 1 result: 0 0 0 0 0   expected    Thread 1 result: 7 14 21 28 35
Thread 2 result: 3 6 9 12  or Thread 2 result: 3 6 9 12     result:    Thread 2 result: 3 6 9 12

So I found two workarounds, but neither of them solve the problem. They are included in the code but commented out.

#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#define MAX_THREADS 100
int *thr_out[MAX_THREADS]; // global output record
// function to thread
void *threaded_mul(void* arguments[3])
{
  int* out = arguments[0];
  long num = (long)arguments[1];
  long len = (long)arguments[2];
  for(int i=0; i<len; i++)
    out[i]=num*(i+1);
  pthread_exit(NULL);
}

int main(int argc, char* argv[])
{
  int amt_thr = argc/2;       // one thread needs two arguments
  int thr_i_num[amt_thr];     // number to generate multiples
  int thr_o_len[amt_thr];     // how many multiples to generate
  pthread_t thr_id[amt_thr];  // holds thread ids
  long int thr_args[3];       // forms argument for pthread_create call
  printf("%d threads needed\n",amt_thr);
  for(int i=0; i<amt_thr;i++)
  { // calculate how much memory is needed for each thread
    int oi = 2*i+1; // 0 1 2 3 -> 1 3 5 7
    thr_o_len[i] = strtol(argv[oi+1], NULL, 10);
    thr_i_num[i] = strtol(argv[oi], NULL, 10);
    // allocate the memory
    thr_out[i]=calloc(thr_o_len[i], sizeof(int));
  }
  for(int i=0; i<amt_thr; i++)
  { // create threads
    thr_args[0] = (long)thr_out[i]; // address to write output to
    thr_args[1] = thr_i_num[i];     // input 'val' for thread (number to multiply)
    thr_args[2] = thr_o_len[i];     // output length 'len' for thread
    pthread_create(&thr_id[i], NULL, (void*)threaded_mul, &thr_args);
    //for(int i=0; i<32768; i++){}  //  either delay here
    //pthread_join(thr_id[i],NULL); // or wait until the thread finishes
  }
  printf("joining threads\n");
  for(int i=0; i<amt_thr; i++)
    pthread_join(thr_id[i],NULL);

   for(int t=0; t<amt_thr; t++)
  { // printing resuls
    printf("Thread %d result: ",t);
    for(int j=0; j<thr_o_len[t]; j++)
      printf("%d ",thr_out[t][j]);
    putchar('\n');
  }
  for(int i=0; i<amt_thr; i++)
    free(thr_out[i]);
  return 0;
}

I assume that after a thread is created, main continues normally and the thread starts immediately (on another core) but the same address-space. My observation is that most of the time, at least one thread fails to get the correct arguments and two or more threads do the same computation and write to the same destination, thus leaving the other output destinations untouched.

How to avoid this behavior ?

Edit: and as I understand based on your answers the problem is that before the newly created thread gets to read its arguments &thr_args from memory, the for loop //create threads already wrote new arguments in thr_args[]. But the argument needs to be a pointer to memory, as required by pthread_create.

Edit2: I solved tho problem by writing all inputs (3 each thread) for all threads to memory instead of updating the global input variable thr_args[] inside a for-loop for the reason stated in the paragraph above.


Solution

  • You should maybe post the expected ouput, but from lightly reading your story it seems you shouldn't find zeros in the output.

    When you start your threads, you pass in an array reference (thr_args[]). That means every thread sees the same parameter, which is a memory location. You over-write this array in the thread create loop, so what any particular thread sees is timing / os / #cores dependent. Not quite random, but damn close.

    As a quick hack, I changed your program around pthread_create to:

     void *x = memdup(thr_args, sizeof thr_args);
     pthread_create(&thr_id[i], NULL, threaded_mul, x);
    

    and added a little func above:

    static void *memdup(void *p, size_t n) {
            void *x;
            if ((x = malloc(n))) {
                    memcpy(x, p, n);
                    return x;
            } else {
                    abort();
            }
    }
    

    and your program prints:

    Thread 0 result: 10 20 30 
    Thread 1 result: 7 14 21 28 35 
    Thread 2 result: 3 6 9 12 
    

    of course, this leaks. So you want to correct your program to associate the argument array with the thread, and delete it when the join for that thread is successful.