c++linuxdaemonpclose

How to get the correct exit code of a shell command executed via pipe using popen() and pclose() in a daemonized process?


I'm writing a C++ daemon that is going to run on a Linux embedded system and execute a shell command. I'm trying to get both the output of the command (stdout) as well as the exit code of said shell command. I followed the following SO question on how to get the return value or rather the exit code of a shell command when using pipes: How to execute a command and get return code stdout and stderr of command in C++

This works as expected until I daemonize my process. When I daemonize the process pclose() always returns "-1" regardless of if the command succeeds or fails. The expected result here would be a "0" for successful execution and an integer ">0" for the error code of the shell command.

The output of the command - stdout - can be correctly read and interpreted but trying to pclose the pipe to get the exit code fails with the error code "-1" (See the man page for pclose and how to interpret the return value)

I used this SO question as a reference on how to create a linux daemon: "Creating a daemon in Linux" as well as the book "Linux-UNIX-Programmierung" (a german book) on how to create a daemon and so far that has worked for me without any issues except for breaking the behaviour of pclose().

Here is an example on how to reproduce the issue:

#include <array>  // For std::array
#include <memory> // For std::unique_ptr
#include <string>
#include <sys/stat.h>
#include <sys/syslog.h> // For all syslog things
#include <sys/wait.h>
#include <unistd.h>

// Just a helper method for signal handling
void signalHandler(int sig) {
  switch (sig) {
  case SIGINT:
  case SIGTERM:
    break;
  }
}

int main(int argc, char *argv[]) {
  /* Open log file to be able to use syslog */
  setlogmask(LOG_UPTO(LOG_DEBUG));
  openlog("MyDemoProg", LOG_PID, LOG_DAEMON);

#if 1 // Set to 0 to disable the daemonizing
  /* Fork off the parent process */
  pid_t pid = fork();

  /* An error occurred */
  if (pid < 0)
    exit(EXIT_FAILURE);

  /* Success: Let the parent terminate */
  if (pid > 0)
    exit(EXIT_SUCCESS);

  /* On success: The child process becomes session leader */
  if (setsid() < 0)
    exit(EXIT_FAILURE);

  /* Catch, ignore and handle signals */
  signal(SIGCHLD, SIG_IGN);

  /* Set up a signal handler */
  struct sigaction newSigAction;
  newSigAction.sa_handler = signalHandler;
  sigemptyset(&newSigAction.sa_mask);
  newSigAction.sa_flags = 0;

  /* Signals to handle */
  sigaction(SIGHUP, &newSigAction, NULL);  /* catch hangup signal */
  sigaction(SIGTERM, &newSigAction, NULL); /* catch term signal */
  sigaction(SIGINT, &newSigAction, NULL);  /* catch interrupt signal */

  /* Fork off for the second time*/
  pid = fork();

  /* An error occurred */
  if (pid < 0)
    exit(EXIT_FAILURE);

  /* Success: Let the parent terminate */
  if (pid > 0)
    exit(EXIT_SUCCESS);

  /* Set new file permissions */
  umask(0);

  /* Change the working directory to the root directory */
  /* or another appropriate directory */
  chdir("/");

  /* Close all open file descriptors */
  int x;
  for (x = sysconf(_SC_OPEN_MAX); x >= 0; x--) {
    close(x);
  }
#endif // end of daemonizing

  std::string command = "ls /var/bla/; sleep 2; echo test";
  syslog(LOG_DEBUG, "Command is: %s", command.c_str());

  int rc = -999; // the return code variable, set to some value to see if it's
                 // truly changed
  std::array<char, 16> buffer;
  std::string commandResult;

  // A wrapper function to be able to get the return code while still using the
  // automatic close function wizardy of unique_ptr
  auto pclose_wrapper = [&rc](FILE *cmd) { rc = pclose(cmd); };
  {
    const std::unique_ptr<FILE, decltype(pclose_wrapper)> pipe(
        popen(command.c_str(), "r"), pclose_wrapper);

    if (!pipe) {
      syslog(LOG_ERR, "Could not open pipe! Exiting");
      return EXIT_FAILURE;
    }

    /* Read in the pipe and save the content to a buffer */
    while (::fgets(buffer.data(), buffer.size(), pipe.get()) != nullptr) {
      commandResult += buffer.data();
    }
  }
  syslog(LOG_DEBUG, "Command result is: %s", commandResult.c_str());
  syslog(LOG_DEBUG, "Return code is: %d", rc);

  return EXIT_SUCCESS;
}

To switch from the daemonized and regular version change the #if flag on line 22 to "0". This will diable all the daemonizing related code.

I compiled this code using g++ main.cpp -o pcloseTest -Wall -Werror on my main development machine running Arch Linux x64 with g++ (GCC) version 11.2.0. Then I run the program directly from the terminal with ./pcloseTest and monitor the output of the logging with journalctl (journalctl -f).

I also compiled the same code using my arm-cross-compile tool-chain to test if the error is reproducable on another architecture. The GCC version of my arm-cross-compile tool-chain is 7.3.0 and the OS which runs the program is a customized version of Yocto Linux for 32Bit ARM.

Both behave the same and produce the same output - with the expected output being produced in the non-daemonized version and the unexpected output of "-1" in the daemonized program.

Here is the output of the system log-file for the embedded "Yocto Linux" system (/var/log/message) with the daemonizing enabled:

May  9 15:43:08 MY-EMBEDDED-SYSTEM daemon.debug MyDemoProg[10255]: Command is: ls /var/bla/; sleep 2; echo test
May  9 15:43:10 MY-EMBEDDED-SYSTEM daemon.debug MyDemoProg[10255]: Command result is: test
May  9 15:43:10 MY-EMBEDDED-SYSTEM daemon.debug MyDemoProg[10255]: Return code is: -1

and here with it disabled:

May  9 15:43:26 MY-EMBEDDED-SYSTEM daemon.debug MyDemoProg[10262]: Command is: ls /var/bla/; sleep 2; echo test
May  9 15:43:28 MY-EMBEDDED-SYSTEM daemon.debug MyDemoProg[10262]: Command result is: test
May  9 15:43:28 MY-EMBEDDED-SYSTEM daemon.debug MyDemoProg[10262]: Return code is: 0

For comparison here is the code from my x64 Linux (systemd and journalctl) for the daemonized code:

May 09 16:06:02 aero15 MyDemoProg[76950]: Command is: ls /var/bla/; sleep 2; echo test
May 09 16:06:04 aero15 MyDemoProg[76950]: Command result is: test
May 09 16:06:04 aero15 MyDemoProg[76950]: Return code is: -1

and the normal code:

May 09 16:05:34 aero15 MyDemoProg[76805]: Command is: ls /var/bla/; sleep 2; echo test
May 09 16:05:36 aero15 MyDemoProg[76805]: Command result is: test
May 09 16:05:36 aero15 MyDemoProg[76805]: Return code is: 0

As you can see in all four cases the command result can be correctly read as "test" while in the non-daemon program the return code of pclose() is "0" as expected (to signal a successful execution of the command) and "-1" in the daemonized version, showing that something unexpected happened with pclose().

I tried to research if popen() or pclose() have some known strange behaviour in relation to daemonizing a process but I couldn't find anything specific.
My own suspicion is that maybe a certain signal is not handled by the daemonized process that is requited for pclose() to work or that forking twice to daemonize the process and then forking again for the execution of the pipe with popen breaks something. But I am not which signal might not be handled correctly or might be missing.

For reference I also tried to implement my own version of pclose() using the waitpid() function as suggested by the man page RATIONALE but I scrapped that as it behaved similarly to pclose() and I'm assuming the underlaying issue is the same.

Thank you very much in advance for any help and please let me know if further information or more detail is required.


Solution

  • The issue with my program was, that the signal SIGCHLD was not handled by my program.

    After evaluating errno and checking the error given by pclose() I was able to find a solution. errno returned ECHILD which means "No child processes" according to the errno man page. I caught this error by adding some more handling to my pclose_wrapper lambda function:

    ...
      auto pclose_wrapper = [&rc](FILE *cmd) {
        rc = pclose(cmd);
        if (rc < 0) {
          /* Log the error if pclose returns "-1" signaling an error occurred */
          syslog(LOG_ERR, "rc is negativ - %s", strerror(errno));
        }
      };
    ...
    

    After some more research and looking at the man page for signal I found that my program was ignoring the SIGCHLD signal as mentioned before. This signal informs a process if a child process stopped or terminated.

    The solution was to add sigaction(SIGCHLD, &newSigAction, NULL); and throwing out the line signal(SIGCHLD, SIG_IGN); which explicitly ignored the signal.

    Here is the working code:

    #include <array> // For std::array
    #include <cstring>
    #include <memory> // For std::unique_ptr
    #include <string>
    #include <sys/stat.h>
    #include <sys/syslog.h> // For all syslog things
    #include <sys/wait.h>
    #include <unistd.h>
    
    void signalHandler(int sig) {
      switch (sig) {
      case SIGINT:
      case SIGTERM:
        break;
      case SIGCHLD:
        /* Some child related action */
        break;
      }
    }
    
    int main(int argc, char *argv[]) {
      /* Open log file to be able to use syslog */
      setlogmask(LOG_UPTO(LOG_DEBUG));
      openlog("MyDemoProg", LOG_PID, LOG_DAEMON);
    
    #if 1 // Set to 0 to disable the daemonizing
    
      pid_t pid = fork();
    
      if (pid < 0)
        exit(EXIT_FAILURE);
    
      if (pid > 0)
        exit(EXIT_SUCCESS);
    
      if (setsid() < 0)
        exit(EXIT_FAILURE);
    
      struct sigaction newSigAction;
      newSigAction.sa_handler = signalHandler;
      sigemptyset(&newSigAction.sa_mask);
      newSigAction.sa_flags = 0;
    
      sigaction(SIGHUP, &newSigAction, NULL);  /* catch hangup signal */
      sigaction(SIGTERM, &newSigAction, NULL); /* catch term signal */
      sigaction(SIGINT, &newSigAction, NULL);  /* catch interrupt signal */
      sigaction(SIGCHLD, &newSigAction,
                NULL); /* catch child stopped or terminated signal */
    
      pid = fork();
      if (pid < 0)
        exit(EXIT_FAILURE);
    
      if (pid > 0)
        exit(EXIT_SUCCESS);
    
      umask(0);
      chdir("/");
      for (int x = sysconf(_SC_OPEN_MAX); x >= 0; x--) {
        close(x);
      }
      syslog(LOG_DEBUG, "Daemonizing is enabled");
    #else
      syslog(LOG_DEBUG, "Daemonizing is disabled");
    #endif
    
      std::string command = "ls /var/bla/; sleep 2; echo test";
      syslog(LOG_DEBUG, "Command is: %s", command.c_str());
    
      int rc = -999;
      std::array<char, 16> buffer;
      std::string commandResult;
    
      // A wrapper function to be able to get the return code while still using the
      // automatic close function wizardy of unique_ptr
      auto pclose_wrapper = [&rc](FILE *cmd) {
        rc = pclose(cmd);
        if (rc < 0) {
          /* Log the error if pclose returns "-1" signaling an error occured */
          syslog(LOG_ERR, "rc is negativ - %s", strerror(errno));
        }
      };
      {
        const std::unique_ptr<FILE, decltype(pclose_wrapper)> pipe(
            popen(command.c_str(), "r"), pclose_wrapper);
    
        if (!pipe) {
          syslog(LOG_ERR, "Could not open pipe! Exiting");
          return EXIT_FAILURE;
        }
    
        /* Read in the pipe and save the content to a buffer */
        while (::fgets(buffer.data(), buffer.size(), pipe.get()) != nullptr) {
          commandResult += buffer.data();
        }
      }
      syslog(LOG_DEBUG, "Command result is: %s", commandResult.c_str());
      syslog(LOG_DEBUG, "Return code is: %d", rc);
    
      return EXIT_SUCCESS;
    }
    

    And here is the output of both the non-daemonized and daemonized version:
    (I added a syslog message to indicate if the daemonizing code was enabled or disabled.)

    May 10 09:24:30 MY-EMBEDDED-DEVICE daemon.debug MyDemoProg[10872]: Daemonizing is disabled
    May 10 09:24:30 MY-EMBEDDED-DEVICE daemon.debug MyDemoProg[10872]: Command is: ls /var/bla/; sleep 2; echo test
    May 10 09:24:32 MY-EMBEDDED-DEVICE daemon.debug MyDemoProg[10872]: Command result is: test
    May 10 09:24:32 MY-EMBEDDED-DEVICE daemon.debug MyDemoProg[10872]: Return code is: 0
    ---
    May 10 09:24:49 MY-EMBEDDED-DEVICE daemon.debug MyDemoProg[10881]: Daemonizing is enabled
    May 10 09:24:49 MY-EMBEDDED-DEVICE daemon.debug MyDemoProg[10881]: Command is: ls /var/bla/; sleep 2; echo test
    May 10 09:24:51 MY-EMBEDDED-DEVICE daemon.debug MyDemoProg[10881]: Command result is: test
    May 10 09:24:51 MY-EMBEDDED-DEVICE daemon.debug MyDemoProg[10881]: Return code is: 0
    

    Now both versions give the expected return code of "0".