cdebuggingpipeipcblocking

Process hangs on the write to the pipe


I am writing a program for inter-process communication, but I encountered a problem where the write operation blocks the process, even though there is enough space in the pipe.

I am using a remote host with a pipe buffer size of 8192, which I know thanks to the following:

#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

int main() {

    int fd[2];
    pipe(fd);

    printf("Pipe size: %d\n", fcntl(fd[1], F_GETPIPE_SZ));

    close(fd[1]);
    close(fd[0]);

    return 0;
}

In the example below, I create 16 processes, each with its own pipe. Then each process writes 512B to the pipe of its children. The children read these messages. The root is labeled 0, and the children are numbered consecutively as 2k+1, 2k+2, where k is the process number. Finally, each process sends one message to all of the pipes.

Therefore, 16*512B = 8192 will be written to the root's pipe, and to every other pipe (16 + 1)*512B = 8192 + 512, but the one extra message will be read, so the whole should fit in the pipe.

MRE (This example doesn't do anything useful; it only illustrates my problem):

#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <sys/ioctl.h>

#define NO_OF_PROCESSES 16
#define NO_OF_MESSAGES 1
#define ROOT 0

#define ERROR_CHECK(result) \
do { \
    if ((result) == -1) { \
        fprintf(stderr, "Error at line %d\n", __LINE__); \
        exit(1); \
    } \
} while (0)

#define NOT_PARTIAL(result) \
do { \
    if ((result) != 512) { \
        fprintf(stderr, "Error at line %d\n", __LINE__); \
        exit(1); \
    } \
} while (0)

void close_pipes(int fd[NO_OF_PROCESSES][2]) {
    for (int i = 0; i < NO_OF_PROCESSES; i++) {
        ERROR_CHECK(close(fd[i][0]));
        ERROR_CHECK(close(fd[i][1]));
    }
}

void child_code(int fd[NO_OF_PROCESSES][2], int child_id) {

    void* message = malloc(512);
    if (message == NULL)
        exit(EXIT_FAILURE);

    memset(message, 0, 512);

    int l = 2 * child_id + 1;
    int r = 2 * child_id + 2;

    // Every process sends two messages to each of its children.
    if (child_id == ROOT || l < NO_OF_PROCESSES) { // Root or any other parent.

        if (child_id != ROOT)
            for (int i = 0; i < NO_OF_MESSAGES; i++)
                NOT_PARTIAL(read(fd[child_id][0], message, 512));

        if (l < NO_OF_PROCESSES)
            for (int i = 0; i < NO_OF_MESSAGES; i++)
                NOT_PARTIAL(write(fd[l][1], message, 512));

        if (r < NO_OF_PROCESSES)
            for (int i = 0; i < NO_OF_MESSAGES; i++)
                NOT_PARTIAL(write(fd[r][1], message, 512));
    }
    else { // Leaf.
        for (int i = 0; i < NO_OF_MESSAGES; i++)
            NOT_PARTIAL(read(fd[child_id][0], message, 512));
    }

    printf("Ok, process %d\n", child_id);

    // Process sends one message to every other process.
    for (int i = 0; i < NO_OF_PROCESSES; i++) {
        int pipe_size = 0;
        ioctl(fd[i][1], FIONREAD, &pipe_size);
        printf("Check_1, process %d, there are %d bytes in the pipe, iteration %d\n", child_id, pipe_size, i);
        NOT_PARTIAL(write(fd[i][1], message, 512));
        printf("Check_2, process %d\n", child_id);
        fflush(stdout);
    }

    free(message);

    printf("Finished, process %d\n", child_id);
}

int main() {

    // Each child has its own pipe.
    int fd[NO_OF_PROCESSES][2];
    for (int i = 0; i < NO_OF_PROCESSES; i++) {
        ERROR_CHECK(pipe(fd[i]));
    }

    // Creating children processes.
    for (int i = 0; i < NO_OF_PROCESSES; i++) {

        int fork_result = fork();
        ERROR_CHECK(fork_result);

        if (fork_result== 0) { // Child process.
            child_code(fd, i);
            close_pipes(fd);
            return 0;
        }
    }

    close_pipes(fd);

    // Waiting for all children to finish.
    for (int i = 0; i < NO_OF_PROCESSES; i++) {
        ERROR_CHECK(wait(NULL));
    }

    return 0;
}

Currently, the result is that the program does not terminate because some processes hang.

Last lines of the output:

Ok, process 12
Check_1, process 2, there are 7168 bytes in the pipe, iteration 15
Check_2, process 2
Check_1, process 12, there are 7680 bytes in the pipe, iteration 0
Finished, process 2
Check_2, process 12
Check_1, process 12, there are 7680 bytes in the pipe, iteration 1

As you can see, the Check_2, process 12 is missing and the process hangs on write, even though in the full output Ok appears 16 times, which theoretically means that all messages 'from the tree' should be read.

The program works for 15 or fewer processes because then, at most, 8192B goes into the pipe. Similarly, the code works on a system where the pipe has a larger capacity.

Where am I making a mistake? Why does the process hang? If my code works for you, perhaps you have a different buffer size in the pipe.

Recently (rather clumsily) I asked a similar question. I am adding a new post instead of editing the old one because entire content would change, and the existing answers would no longer make sense. I hope this post is better.

Thanks a lot.


Solution

  • Where am I making a mistake?

    Inasmuch as your program's output seems to show that there is enough space in the pipe buffer to accommodate the data that the last process is trying to write, but the write nevertheless hangs, there are only a few reasonable explanations:

    1. Your system has some additional limitation for which you have not accounted. For example, a limit on the aggregate data buffered in all your pipes at any given time.

    2. Your system has a bug that your program manages to trigger.

    You haven't provided any system details, so we cannot be any more specific. Nevertheless, I note that even after adjusting for the pipe buffer size on my system (65536 bytes), I cannot reproduce your program's hang. Thus, I do think that the behavior you observe is system-specific.

    Nevertheless, I can answer the question at a high level: your mistake is in writing data to pipes when you have no expectation that it will be read. Pipes are a data transfer mechanism, not a data storage mechanism. It is incumbent on you as programmer to ensure that, to the extent that it is within your control, data you write to your pipes will also be consumed from them.

    Addendum

    As a secondary, more pragmatic matter, it can cause a variety of issues to leave pipe ends open past when they are needed. In this light, it's good that the parent process closes its copies of all the pipe ends as soon as it finishes forking all the children. But the children ought at startup to close the read ends of all the pipes other than their own, and to close the write end of their own pipe. I expect that if they did this then the program would not hang, but instead some of the writes would fail with EPIPE. As they should.