cforkwaitpid

c - does waitpid() and co detect child termination only when all children are terminated?


I am writing this question with a bit of a confusion as i feel that I'm missing some point (that's why I am writing it, after all).

So i've been studying how multiple processes access a single file. I have a basic code that makes fork two times - both children fprintf into the same file, the main difference is that one of them sleeps and does much more fprintfs. Child exits when all fprinfs are done. At the same time the parent waitpids, and every time a child terminates it fprintfs into the same file.

What I wanted to see is 1) child process that had more fprintfs and sleep being terminated later then the other child (i think difference in running time should kinda provide a good probability that that would happen) - and that happens; 2) to see first fprintf of a parent process somewhere in the middle of the file as (how i thought!) the first child should be waitpided way before the second one is terminated - that is not what happened.

What happens every time is that both fprintfs are yielded into the file by the parent file at the very end of the file, just like the parent waited until both children are terminated and only then waitpided them.

Exchanging waitpid with wait obviously produced the same result.

I have several guesses:

  1. The second child terminates faster then the parent have time to waitpid the first one and fprintf into the file.
  2. The OS doesn't have time to send SIGCHILD to the parent before the second child terminates.
  3. This is how waitpid works, like maybe signals are queued? (but i haven't found any specification of such functionality).

Could somebody please explain why I am not getting the message about first child terminating in the middle of the file, but do get it at the end?


The code of the program:

  #include <stdio.h>
  #include <stdlib.h>
  #include <unistd.h>
  #include <errno.h>
  #include <sys/wait.h>
  
  #define N 1000000
  #define SLEEP_TIME 20
  
  int main(void)
  {
      FILE *fd1 = fopen("test.txt", "w");
      pid_t pid1, pid2, cpid;
      int wstatus;
  
      pid1 = fork();
      if(0 == pid1) {
          for(int i = 0; i < N; ++i) {
              fprintf(fd1, "child1 %d %d\n", getpid(), i);
          }
          sleep(SLEEP_TIME);
          for(int i = 0; i < N; ++i) {
              fprintf(fd1, "child1a %d %d\n", getpid(), i);
          }
          sleep(SLEEP_TIME);
          fclose(fd1);
          exit(EXIT_SUCCESS);
      } 
      else if(-1 == pid1) {
          exit(EXIT_FAILURE);
      }
      
      pid2 = fork();
      if(0 == pid2) { 
          for(int i = 0; i < N/2; ++i) {
              fprintf(fd1, "child2 %d %d\n", getpid(), i);
          }
          fclose(fd1);
          exit(EXIT_SUCCESS);
      } 
      else if(-1 == pid2) {
          exit(EXIT_FAILURE);
      }
      
      while(((cpid = wait(&wstatus)) != -1)) {
      //while(((cpid = waitpid(-1, &wstatus, WUNTRACED | WCONTINUED)) != -1))               if(WIFEXITED(wstatus))
              fprintf(fd1, "child %d exited with status %d\n", cpid, wstatus);
      }        
      if(errno == ECHILD) { 
          fprintf(fd1, "All children are waited for!\n");
      }
      else {
          perror("waitpid");
          exit(EXIT_FAILURE);
      }
      fclose(fd1);
      
      exit(EXIT_SUCCESS);
  }

The last lines of the resulting file:

2499998 child1a 7359 999997
2499999 child1a 7359 999998
2500000 child1a 7359 999999
2500001 child 7360 exited with status 0 //I wanted this one to be in the middle of the file!
2500002 child 7359 exited with status 0
2500003 All children are waited for!

Solution

  • No, waitpid does return each time a child exits. The problem is that your test is flawed.

    On Unix, when you access regular files with stdio functions such as fprintf, by default they are fully buffered. This is desirable when only one process is writing to the file, as it reduces system call overhead, but can be undesirable when timing is important or when trying to synchronize with other processes.

    So waitpid is in fact returning immediately after child2 exits, and fprintf is being called at that time, but it doesn't write its message into the the file immediately; rather, it remains buffered in the parent's memory. It will only be written out when the buffer fills up (doesn't happen in the parent, it's usually many KB), or when you call fflush (you don't), or when the file is closed (including on process exit). So both messages are written out together when you call fclose(fd1) in the parent, at which point both children have already exited.

    For a test that better illustrates what's happening, disable buffering on this file by calling something like setvbuf(fd1, NULL, _IONBF, 0) immediately after opening the file. Then you should see the "child2 exited" message in the middle of the file, as you expect.