I am trying to create a program that takes in command line a number of counts to do and does them in parallel.
I have one count.c file that is used to make the counting :
int main(int argc, char** argv)
{
assert(argc>1);
int length= atoi(argv[1]);
assert(length>0);
int pid = getpid();
printf("%d : %s\n", pid, "start");
for (unsigned int i=length; i>0; i--)
{
printf("%d : %d\n", pid, i);
sleep(1);
}
printf("%d : %s\n", pid, "done");
return 0;
}
so if I enter "./count 5" in the bash, the program counts from 5 to 1.
I have another multiple.c file :
int main(int argc, char** argv)
{
assert(argc>2);
unsigned int i;
char * nameExec = (char*) malloc(sizeof(char)*(strlen(argv[1])-1));
char * time;
int number = argc-2; // number of counting to do
// name of programm
for (i=2; i<strlen(argv[1]); i++)
{
nameExec[i-2]=argv[1][i];
}
nameExec[i-2]='\0';
if(number==1) // one counting there is no fork needed
{
execl(argv[1], nameExec, argv[2], NULL);
} else
{
for (unsigned int i=2; i<number+1; i++) // if we have 2 counts to do, then we need 1 fork, if there is 3 counts to do then there is 2 forks...
{
if(fork()==0) // child process
{
time = argv[i];
} else
{
time = argv[i+1];
wait(NULL); // father process waits for child
}
}
execl(argv[1], nameExec, time, NULL);
}
return 0;
}
what I want to do with this program is that I enter in the command line I enter for instance "./multiple ./count 5 4 3" and that it starts 3 counts in parallel (3 parallel processes).
Tests I have done : If I enter ./multiple ./count 5 4 it does two counts, one starting from 5 and the other from 4, but not simultaneously, one after the other. If I enter ./multiple ./count 5 4 3 it does 4 counts, one starting from 4, then one starting from 3, then another one starting from 4, and another one starting from 3.
I really don't understand this behavior, From what I understand, fork() is used to duplicate the process, and execl abandons the current process and start executing another one.
Please help!
(Also, I am trying to understand the use of fork() and execl(), so I would like to find I way to answer my problem using these two functions).
Your original code runs the child processes in sequence rather than concurrently because you have the wait()
call inside the loop.
You don't need to copy the program name. You could either use argv[1]
directly (or simply assign it to nameExec
) or skip the first couple of characters by using nameExec = &argv[1][2];
.
It's very tricky to understand the operation of the loop in your code; it's sent me screaming a few times as I try to wrap my brain around it. I'm going to simply write the code from scratch — in two variants.
The simpler variant to understand has the parent (initial) process launch one child per counter, and then it waits until it has no children left. It reports the PID and exit status of the children as they exit; it would be feasible to simply collect the corpses without printing an 'in memoriam'.
/* SO 6021-0236 */
/* Variant 1: Original process forks children and waits for them to complete */
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
int main(int argc, char **argv)
{
assert(argc > 2);
/* Launch children */
for (int i = 2; i < argc; i++)
{
if (fork() == 0) // child process
{
execl(argv[1], argv[1], argv[i], (char *)0);
fprintf(stderr, "failed to execute %s\n", argv[1]);
exit(EXIT_FAILURE);
}
}
/* Wait for children */
int corpse;
int status;
while ((corpse = wait(&status)) > 0)
{
printf("%d: PID %d exited with status 0x%.4X\n",
(int)getpid(), corpse, status);
}
return 0;
}
I renamed your counter program so the source file is counter23.c
and the program is counter23
and the only other significant change removed the space before the colon in the printf()
output.
I called the source code above multiple43.c
, compiled to multiple43
.
$ multiple43 count23 1
54251: start
54251: 1
54251: done
54250: PID 54251 exited with status 0x0000
$ multiple43 count23 3 4 5
54261: start
54261: 5
54260: start
54260: 4
54259: start
54259: 3
54261: 4
54260: 3
54259: 2
54261: 3
54260: 2
54259: 1
54261: 2
54260: 1
54259: done
54258: PID 54259 exited with status 0x0000
54261: 1
54260: done
54258: PID 54260 exited with status 0x0000
54261: done
54258: PID 54261 exited with status 0x0000
$
In the run with three children, you can see that all three are producing output concurrently.
This is the variant that I think you should use unless there is an explicit requirement to do something else.
The other variant more or less approximates your code (though the approximation is not very good) in that the original process itself executes the counter program too. Therefore, if the original process has fewer cycles than the others, it terminates before the others complete (see the difference between the 3 4 5
and 5 4 3
examples). It does run the counters concurrently, though.
/* SO 6021-0236 */
/* Variant 2: Original process launches children, the execs itself */
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
int main(int argc, char **argv)
{
assert(argc > 2);
/* Launch children */
for (int i = 3; i < argc; i++)
{
if (fork() == 0) // child process
{
execl(argv[1], argv[1], argv[i], (char *)0);
fprintf(stderr, "failed to execute %s\n", argv[1]);
exit(EXIT_FAILURE);
}
}
execl(argv[1], argv[1], argv[2], (char *)0);
fprintf(stderr, "failed to execute %s\n", argv[1]);
return(EXIT_FAILURE);
}
This code was multiple53.c
compiled to multiple53
.
$ multiple53 count23 3 4 5
54269: start
54268: start
54267: start
54269: 5
54268: 4
54267: 3
54269: 4
54268: 3
54267: 2
54268: 2
54267: 1
54269: 3
54268: 1
54267: done
54269: 2
$ 54268: done
54269: 1
54269: done
$ multiple53 count23 5 4 3
54270: start
54272: start
54270: 5
54272: 3
54271: start
54271: 4
54270: 4
54272: 2
54271: 3
54272: 1
54270: 3
54271: 2
54271: 1
54272: done
54270: 2
54270: 1
54271: done
54270: done
$
The blank line appeared because I hit return — the prompt appeared 3 lines earlier but was followed by more output from 54268 and 54269. I regard this as much less likely to be what's wanted.
To try and understand the original code, I instrumented it after making some minor changes (saved in multiple31.c
and compiled to multiple31
):
/* SO 6021-0236 */
/* Original algorithm with instrumentation */
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#include <unistd.h>
int main(int argc, char **argv)
{
assert(argc > 2);
char *nameExec = argv[1];
char *time;
int number = argc - 2;
if (number == 1)
{
printf("%d: name = %s; time = %s\n", (int)getpid(), nameExec, argv[2]);
execl(argv[1], nameExec, argv[2], NULL);
}
else
{
for (int i = 2; i <= number; i++) // Idempotent change in condition
{
printf("%d: i = %d; number = %d\n", (int)getpid(), i, number);
pid_t kid = fork();
if (kid == 0)
{
time = argv[i];
printf("%d: i = %d; time = %s; ppid = %d\n",
(int)getpid(), i, time, (int)getppid());
}
else
{
time = argv[i + 1];
printf("%d: i = %d; time = %s; waiting for %d\n",
(int)getpid(), i, time, (int)kid);
int status;
int corpse = wait(&status);
printf("%d: i = %d; time = %s; PID %d exited with status 0x%.4X\n",
(int)getpid(), i, time, corpse, status);
}
}
printf("%d: name = %s; time = %s\n", (int)getpid(), nameExec, time);
execl(argv[1], nameExec, time, NULL);
}
printf("%d: this should not be reached!\n", (int)getpid());
return 0;
}
When run with 4 times, it produces output such as:
$ multiple31 count23 5 4 3 2
54575: i = 2; number = 4
54575: i = 2; time = 4; waiting for 54576
54576: i = 2; time = 5; ppid = 54575
54576: i = 3; number = 4
54576: i = 3; time = 3; waiting for 54577
54577: i = 3; time = 4; ppid = 54576
54577: i = 4; number = 4
54577: i = 4; time = 2; waiting for 54578
54578: i = 4; time = 3; ppid = 54577
54578: name = count23; time = 3
54578: start
54578: 3
54578: 2
54578: 1
54578: done
54577: i = 4; time = 2; PID 54578 exited with status 0x0000
54577: name = count23; time = 2
54577: start
54577: 2
54577: 1
54577: done
54576: i = 3; time = 3; PID 54577 exited with status 0x0000
54576: i = 4; number = 4
54576: i = 4; time = 2; waiting for 54579
54579: i = 4; time = 3; ppid = 54576
54579: name = count23; time = 3
54579: start
54579: 3
54579: 2
54579: 1
54579: done
54576: i = 4; time = 2; PID 54579 exited with status 0x0000
54576: name = count23; time = 2
54576: start
54576: 2
54576: 1
54576: done
54575: i = 2; time = 4; PID 54576 exited with status 0x0000
54575: i = 3; number = 4
54575: i = 3; time = 3; waiting for 54580
54580: i = 3; time = 4; ppid = 54575
54580: i = 4; number = 4
54580: i = 4; time = 2; waiting for 54581
54581: i = 4; time = 3; ppid = 54580
54581: name = count23; time = 3
54581: start
54581: 3
54581: 2
54581: 1
54581: done
54580: i = 4; time = 2; PID 54581 exited with status 0x0000
54580: name = count23; time = 2
54580: start
54580: 2
54580: 1
54580: done
54575: i = 3; time = 3; PID 54580 exited with status 0x0000
54575: i = 4; number = 4
54575: i = 4; time = 2; waiting for 54582
54582: i = 4; time = 3; ppid = 54575
54582: name = count23; time = 3
54582: start
54582: 3
54582: 2
54582: 1
54582: done
54575: i = 4; time = 2; PID 54582 exited with status 0x0000
54575: name = count23; time = 2
54575: start
54575: 2
54575: 1
54575: done
$
Tracing through why that's the output is fiendish. I started writing an explanation, but I found that my explanation didn't match the actual output — yet again. However, instrumentation along the lines shown is how I usually understand what's going on. One of the key points (simplifying slightly) is that everything is waiting on a child to die except for the one child that is doing its countdown. Running tests with 1, 2, or 3 times instead of 4 is consistent with this, but simpler (simultaneously less confusing and more confusing). Using 5 times increases the amount of output but doesn't really provide more enlightenment.