cparallel-processingprocessforkexecl

Parellel processes using fork() with command line parameters in C


I am trying to create a program that takes in command line a number of counts to do and does them in parallel.

I have one count.c file that is used to make the counting :

int main(int argc, char** argv)
{
   assert(argc>1);

   int length= atoi(argv[1]); 
   assert(length>0);

   int pid = getpid(); 

   printf("%d : %s\n", pid, "start");

    for (unsigned int i=length; i>0; i--)
    {
      printf("%d : %d\n", pid, i);

      sleep(1); 
    }

    printf("%d : %s\n", pid, "done");


    return 0;
}

so if I enter "./count 5" in the bash, the program counts from 5 to 1.

I have another multiple.c file :

int main(int argc, char** argv)
{
   assert(argc>2);
   unsigned int i;

   char * nameExec = (char*) malloc(sizeof(char)*(strlen(argv[1])-1));

   char * time;

   int number = argc-2; // number of counting to do

    // name of programm
    for (i=2; i<strlen(argv[1]); i++)
    {
      nameExec[i-2]=argv[1][i];
    }
    nameExec[i-2]='\0';


    if(number==1) //  one counting there is no fork needed
    {
      execl(argv[1], nameExec, argv[2], NULL);
    } else
    {
      for (unsigned int i=2; i<number+1; i++) // if we have 2 counts to do, then we need 1 fork, if there is 3 counts to do then there is 2 forks...
      {
        if(fork()==0) // child process
        {
          time = argv[i];
        } else
        {
          time = argv[i+1];
          wait(NULL); // father process waits for child
        }

      }

      execl(argv[1], nameExec, time, NULL);

    }

    return 0;
}

what I want to do with this program is that I enter in the command line I enter for instance "./multiple ./count 5 4 3" and that it starts 3 counts in parallel (3 parallel processes).

Tests I have done : If I enter ./multiple ./count 5 4 it does two counts, one starting from 5 and the other from 4, but not simultaneously, one after the other. If I enter ./multiple ./count 5 4 3 it does 4 counts, one starting from 4, then one starting from 3, then another one starting from 4, and another one starting from 3.

I really don't understand this behavior, From what I understand, fork() is used to duplicate the process, and execl abandons the current process and start executing another one.

Please help!

(Also, I am trying to understand the use of fork() and execl(), so I would like to find I way to answer my problem using these two functions).


Solution

  • Your original code runs the child processes in sequence rather than concurrently because you have the wait() call inside the loop.

    You don't need to copy the program name. You could either use argv[1] directly (or simply assign it to nameExec) or skip the first couple of characters by using nameExec = &argv[1][2];.

    It's very tricky to understand the operation of the loop in your code; it's sent me screaming a few times as I try to wrap my brain around it. I'm going to simply write the code from scratch — in two variants.

    Variant 1

    The simpler variant to understand has the parent (initial) process launch one child per counter, and then it waits until it has no children left. It reports the PID and exit status of the children as they exit; it would be feasible to simply collect the corpses without printing an 'in memoriam'.

    /* SO 6021-0236 */
    /* Variant 1: Original process forks children and waits for them to complete */
    #include <assert.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <sys/wait.h>
    #include <unistd.h>
    
    int main(int argc, char **argv)
    {
        assert(argc > 2);
    
        /* Launch children */
        for (int i = 2; i < argc; i++)
        {
            if (fork() == 0)     // child process
            {
                execl(argv[1], argv[1], argv[i], (char *)0);
                fprintf(stderr, "failed to execute %s\n", argv[1]);
                exit(EXIT_FAILURE);
            }
        }
    
        /* Wait for children */
        int corpse;
        int status;
        while ((corpse = wait(&status)) > 0)
        {
            printf("%d: PID %d exited with status 0x%.4X\n",
                   (int)getpid(), corpse, status);
        }
    
        return 0;
    }
    

    I renamed your counter program so the source file is counter23.c and the program is counter23 and the only other significant change removed the space before the colon in the printf() output.

    I called the source code above multiple43.c, compiled to multiple43.

    $ multiple43 count23 1
    54251: start
    54251: 1
    54251: done
    54250: PID 54251 exited with status 0x0000
    $ multiple43 count23 3 4 5
    54261: start
    54261: 5
    54260: start
    54260: 4
    54259: start
    54259: 3
    54261: 4
    54260: 3
    54259: 2
    54261: 3
    54260: 2
    54259: 1
    54261: 2
    54260: 1
    54259: done
    54258: PID 54259 exited with status 0x0000
    54261: 1
    54260: done
    54258: PID 54260 exited with status 0x0000
    54261: done
    54258: PID 54261 exited with status 0x0000
    $
    

    In the run with three children, you can see that all three are producing output concurrently.

    This is the variant that I think you should use unless there is an explicit requirement to do something else.

    Variant 2

    The other variant more or less approximates your code (though the approximation is not very good) in that the original process itself executes the counter program too. Therefore, if the original process has fewer cycles than the others, it terminates before the others complete (see the difference between the 3 4 5 and 5 4 3 examples). It does run the counters concurrently, though.

    /* SO 6021-0236 */
    /* Variant 2: Original process launches children, the execs itself */
    #include <assert.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <sys/wait.h>
    #include <unistd.h>
    
    int main(int argc, char **argv)
    {
        assert(argc > 2);
    
        /* Launch children */
        for (int i = 3; i < argc; i++)
        {
            if (fork() == 0)     // child process
            {
                execl(argv[1], argv[1], argv[i], (char *)0);
                fprintf(stderr, "failed to execute %s\n", argv[1]);
                exit(EXIT_FAILURE);
            }
        }
    
        execl(argv[1], argv[1], argv[2], (char *)0);
        fprintf(stderr, "failed to execute %s\n", argv[1]);
        return(EXIT_FAILURE);
    }
    

    This code was multiple53.c compiled to multiple53.

    $ multiple53 count23 3 4 5
    54269: start
    54268: start
    54267: start
    54269: 5
    54268: 4
    54267: 3
    54269: 4
    54268: 3
    54267: 2
    54268: 2
    54267: 1
    54269: 3
    54268: 1
    54267: done
    54269: 2
    $ 54268: done
    54269: 1
    54269: done
    
    $ multiple53 count23 5 4 3
    54270: start
    54272: start
    54270: 5
    54272: 3
    54271: start
    54271: 4
    54270: 4
    54272: 2
    54271: 3
    54272: 1
    54270: 3
    54271: 2
    54271: 1
    54272: done
    54270: 2
    54270: 1
    54271: done
    54270: done
    $
    

    The blank line appeared because I hit return — the prompt appeared 3 lines earlier but was followed by more output from 54268 and 54269. I regard this as much less likely to be what's wanted.

    Instrumented Variant 0

    To try and understand the original code, I instrumented it after making some minor changes (saved in multiple31.c and compiled to multiple31):

    /* SO 6021-0236 */
    /* Original algorithm with instrumentation */
    #include <assert.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <sys/wait.h>
    #include <unistd.h>
    
    int main(int argc, char **argv)
    {
        assert(argc > 2);
        char *nameExec = argv[1];
        char *time;
        int number = argc - 2;
    
        if (number == 1)
        {
            printf("%d: name = %s; time = %s\n", (int)getpid(), nameExec, argv[2]);
            execl(argv[1], nameExec, argv[2], NULL);
        }
        else
        {
            for (int i = 2; i <= number; i++)  // Idempotent change in condition
            {
                printf("%d: i = %d; number = %d\n", (int)getpid(), i, number);
                pid_t kid = fork();
                if (kid == 0)
                {
                    time = argv[i];
                    printf("%d: i = %d; time = %s; ppid = %d\n",
                           (int)getpid(), i, time, (int)getppid());
                }
                else
                {
                    time = argv[i + 1];
                    printf("%d: i = %d; time = %s; waiting for %d\n",
                           (int)getpid(), i, time, (int)kid);
                    int status;
                    int corpse = wait(&status);
                    printf("%d: i = %d; time = %s; PID %d exited with status 0x%.4X\n",
                           (int)getpid(), i, time, corpse, status);
                }
            }
            printf("%d: name = %s; time = %s\n", (int)getpid(), nameExec, time);
            execl(argv[1], nameExec, time, NULL);
        }
    
        printf("%d: this should not be reached!\n", (int)getpid());
        return 0;
    }
    

    When run with 4 times, it produces output such as:

    $ multiple31 count23 5 4 3 2
    54575: i = 2; number = 4
    54575: i = 2; time = 4; waiting for 54576
    54576: i = 2; time = 5; ppid = 54575
    54576: i = 3; number = 4
    54576: i = 3; time = 3; waiting for 54577
    54577: i = 3; time = 4; ppid = 54576
    54577: i = 4; number = 4
    54577: i = 4; time = 2; waiting for 54578
    54578: i = 4; time = 3; ppid = 54577
    54578: name = count23; time = 3
    54578: start
    54578: 3
    54578: 2
    54578: 1
    54578: done
    54577: i = 4; time = 2; PID 54578 exited with status 0x0000
    54577: name = count23; time = 2
    54577: start
    54577: 2
    54577: 1
    54577: done
    54576: i = 3; time = 3; PID 54577 exited with status 0x0000
    54576: i = 4; number = 4
    54576: i = 4; time = 2; waiting for 54579
    54579: i = 4; time = 3; ppid = 54576
    54579: name = count23; time = 3
    54579: start
    54579: 3
    54579: 2
    54579: 1
    54579: done
    54576: i = 4; time = 2; PID 54579 exited with status 0x0000
    54576: name = count23; time = 2
    54576: start
    54576: 2
    54576: 1
    54576: done
    54575: i = 2; time = 4; PID 54576 exited with status 0x0000
    54575: i = 3; number = 4
    54575: i = 3; time = 3; waiting for 54580
    54580: i = 3; time = 4; ppid = 54575
    54580: i = 4; number = 4
    54580: i = 4; time = 2; waiting for 54581
    54581: i = 4; time = 3; ppid = 54580
    54581: name = count23; time = 3
    54581: start
    54581: 3
    54581: 2
    54581: 1
    54581: done
    54580: i = 4; time = 2; PID 54581 exited with status 0x0000
    54580: name = count23; time = 2
    54580: start
    54580: 2
    54580: 1
    54580: done
    54575: i = 3; time = 3; PID 54580 exited with status 0x0000
    54575: i = 4; number = 4
    54575: i = 4; time = 2; waiting for 54582
    54582: i = 4; time = 3; ppid = 54575
    54582: name = count23; time = 3
    54582: start
    54582: 3
    54582: 2
    54582: 1
    54582: done
    54575: i = 4; time = 2; PID 54582 exited with status 0x0000
    54575: name = count23; time = 2
    54575: start
    54575: 2
    54575: 1
    54575: done
    $
    

    Tracing through why that's the output is fiendish. I started writing an explanation, but I found that my explanation didn't match the actual output — yet again. However, instrumentation along the lines shown is how I usually understand what's going on. One of the key points (simplifying slightly) is that everything is waiting on a child to die except for the one child that is doing its countdown. Running tests with 1, 2, or 3 times instead of 4 is consistent with this, but simpler (simultaneously less confusing and more confusing). Using 5 times increases the amount of output but doesn't really provide more enlightenment.