goconcurrencyforkgoroutine

Go Program Stuck at syscall.Wait4 in Concurrent Forking Loop


I'm working on a Go program that creates child processes using syscall.RawSyscall(syscall.SYS_FORK) in a concurrent loop. Each child process needs to execute a command (/bin/ls) with specific seccomp and rlimit restrictions applied. The parent process should wait for all child processes to finish using syscall.Wait4. However, I'm encountering an issue where the program gets stuck at syscall.Wait4(int(r1), nil, 0, nil). Here's the relevant part of the code:

package main

import (
    "fmt"
    "os"
    "os/exec"
    "runtime"
    "sync"
    "syscall"
)

const n = 100

func main() {
    var wg sync.WaitGroup
    wg.Add(n)
    for _ = range n {
        go func() {
            r1, _, err := syscall.RawSyscall(syscall.SYS_FORK, 0, 0, 0)
            if err != 0 {
                println("Error: ", err)
                panic(err)
            }
            if r1 == 0 {
                // Apply seccomp and rlimit restrictions here

                cmd := exec.Command("/bin/ls", "ls")
                cmd.Run()
                os.Exit(0)
            } else {
                fmt.Println(int(r1))
                syscall.Wait4(int(r1), nil, 0, nil)
                wg.Done()
            }
        }()
    }
    wg.Wait()
    fmt.Println("Done")
}

If I remove the syscall.Wait4 line, the program no longer gets stuck, but it also doesn't wait for all its children to finish, which is not the desired behavior. I need to ensure that the parent process waits for all child processes to finish, and that the seccomp and rlimit restrictions are properly applied to each child process.

I used ChatGPT to help me generate C++ version that does the same thing and it works as expected. Here is the C++ code I used to test with.

#include <iostream>
#include <sys/wait.h>
#include <unistd.h>
#include <vector>
#include <thread>

const int n = 100;

void forkAndExec() {
    pid_t pid = fork();
    if (pid == -1) {
        perror("fork");
        exit(EXIT_FAILURE);
    } else if (pid == 0) {
        // Child process
        execl("/bin/ls", "ls", nullptr);
        // If execl is successful, this line won't be executed
        perror("execl");
        exit(EXIT_FAILURE);
    } else {
        // Parent process
        std::cout << "Child PID: " << pid << std::endl;
        int status;
        waitpid(pid, &status, 0);
    }
}

int main() {
    std::vector<std::thread> threads;
    for (int i = 0; i < n; ++i) {
        threads.emplace_back(forkAndExec);
    }

    for (auto &t : threads) {
        t.join();
    }

    std::cout << "Done" << std::endl;
    return 0;
}

Can anyone help me understand why the program is getting stuck at syscall.Wait4 and how to fix it so that the parent process waits for all child processes to finish before exiting? Additionally, any guidance on correctly applying the seccomp and rlimit restrictions to the child processes would be greatly appreciated.

OS: Debian 12

Kernel version: 6.1.0-18-amd64

Tried with runtime.Gosched(). No luck. Also I tried to restrict the concurrent processes which it can spawn and make the number below the CPU core I have. It does help a bit but I will sometimes still encounter with it again.


Solution

  • You can't simply fork a Go program with fork(2):

    The child process is created with a single thread—the one that called fork().

    Go's scheduler schedules goroutines on OS threads, and the whole mechanism does stop working in the child. The C++ runtime does not have these issues, since it does not spawns multiple threads by default.

    There is syscall.ForkExec which you can call to execute a wrapper that sets rlimit and chain-calls the child process.

    Note also that you should check the error values returned from the functions.