cmultithreadingpthreadspthread-join

Using pthread_join() on multiple threads giving unexpected behavior


I am learning how to use threads in C and have run into a problem when creating the threads. I am making a program that takes in 2 or more file names as command line arguments, counts the number of bytes in each file in their own thread, and then outputs the name of the largest file. When I use pthread_join() directly after creating a thread, the program runs as intended. However, I know this isn't how threads should be used because it defeats the purpose. When I use pthread_join() in a for loop after creating all the threads, then the program does not work correctly. Could anyone tell me what I am doing wrong? All help is appreciated. Here is my main function.

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; //mutex for changing max_bytes and max_name
int max_bytes = 0;
char max_name[100];

struct arg_struct{ //struct to hold args to pass the threads
        int fd;
        char name[100];
};

int main(int argc, char* argv[])
{
        if(argc < 3){ //checks for correct number of arguments passed
                perror("Wrong number of arguments");
                return EXIT_FAILURE;
        }

        int arg_num = argc - 1; //holds number of arguments passed

        pthread_t threadid[arg_num]; //array of thread IDs
        struct arg_struct args;
        for(int i = 0; i < arg_num; i++){
                args.fd = open(argv[i+1], O_RDONLY);
                memcpy(args.name, argv[i+1], sizeof(args.name)); //copies file name into arg_struct
                int thread_err = pthread_create(&threadid[i], NULL, count_bytes, (void*)&args); //create thread by calling count_bytes and passing it a struct of args
                //pthread_join(threadid[i], NULL);
                if(thread_err != 0){
                        perror("pthread_create failed");
                        return EXIT_FAILURE;
                }
        }

        for(int i = 0; i < arg_num; i++){
                pthread_join(threadid[i], NULL);
        }

        printf("%s is the largest of the submitted files\n", max_name);

        return 0;
}

This is the function that the threads are running.

void *count_bytes(void* arguments)
{
        struct arg_struct *args = (struct arg_struct*)arguments; //casting arguments back to struct from void*
        int fd = args -> fd;
        char name[100];
        memcpy(name, args -> name, sizeof(name)); //copies file name into name from args.name
        int bytes = 0;

        int size = 10;
        char*  buffer = (char*) malloc(size);
        if(buffer == NULL){
                perror("malloc failed");
                exit(EXIT_FAILURE);
        }
        int buffer_count = 0;
        for(int i = 0; i < size; i++){
                buffer[i] = '\0'; //sets all elements to '\0' to determine end of file later
        }
        int read_return = read(fd, &buffer[buffer_count], 1);
        if(read_return == -1){
                perror("reading failed");
                exit(EXIT_FAILURE);
        }

        while(buffer[buffer_count] != '\0'){
                bytes++;
                buffer_count++;
                buffer[buffer_count] = '\0'; //sets previous element to '\0' to determine end of file later
                if(buffer_count >= size){
                        buffer_count = 0; //buffer will hold up to 10 elements and then go back to the beginning
                }
                read_return = read(fd, &buffer[buffer_count], 1);
                if(read_return == -1){
                        perror("reading failed");
                        exit(EXIT_FAILURE);
                }
        }

        printf("%s has %d bytes\n", name, bytes);

        pthread_mutex_lock(&mutex);
        if(bytes > max_bytes){
                max_bytes = bytes;
                memcpy(max_name, name, sizeof(max_name));
        }
        //locks mutex to avoid race condition
        //then sets bytes to max_bytes if it is later than max_bytes
        //then locks mutex to allow another thread to have access
        pthread_mutex_unlock(&mutex);

        return NULL;
}

If it is of any use, these are the two outputs produced when it is running correctly

./a.out another buffered_readword.c
another has 8 bytes
buffered_readword.c has 3747 bytes
buffered_readword.c is the largest of the submitted files

And not correctly

./a.out another buffered_readword.c
buffered_readword.c has 1867 bytes
buffered_readword.c has 1881 bytes
buffered_readword.c is the largest of the submitted files

Solution

  • The problem is that there is only one args structure. After pthread_create is called the new thread may not run immediately. By the time the threads run it is likely that they will both see the same args values. Calling pthread_join inside the thread creation loop "fixes" that because it ensures each thread finishes before args is updated to the next value.

    To fix properly pass a different args to each thread. Illustrative code to do that:

    struct arg_struct args[arg_num];
    for(int i = 0; i < arg_num; i++){
        args[i].fd = open(argv[i+1], O_RDONLY);
        memcpy(args[i].name, argv[i+1], sizeof(args[i].name));
        int thread_err = pthread_create(&threadid[i], NULL, count_bytes, &args[i]); 
        ....