clinuxmultithreadingclonelightweight-processes

Strange behavior of clone


This is fairly simple application which creates a lightweight process (thread) with clone() call.

#define _GNU_SOURCE

#include <sched.h>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>
#include <time.h>

#define STACK_SIZE 1024*1024

int func(void* param) {
    printf("I am func, pid %d\n", getpid());    
    return 0;
}

int main(int argc, char const *argv[]) {
    printf("I am main, pid %d\n", getpid());
    void* ptr = malloc(STACK_SIZE);

    printf("I am calling clone\n");             
    int res = clone(func, ptr + STACK_SIZE, CLONE_VM, NULL);
    // works fine with sleep() call
    // sleep(1);

    if (res == -1) {
        printf("clone error: %d", errno);       
    } else {
        printf("I created child with pid: %d\n", res);      
    }

    printf("Main done, pid %d\n", getpid());        
    return 0;
}

Here are results:

Run 1:

➜  LFD401 ./clone
I am main, pid 10974
I am calling clone
I created child with pid: 10975
Main done, pid 10974
I am func, pid 10975

Run 2:

➜  LFD401 ./clone
I am main, pid 10995
I am calling clone
I created child with pid: 10996
I created child with pid: 10996
I am func, pid 10996
Main done, pid 10995

Run 3:

➜  LFD401 ./clone
I am main, pid 11037
I am calling clone
I created child with pid: 11038
I created child with pid: 11038
I am func, pid 11038
I created child with pid: 11038
I am func, pid 11038
Main done, pid 11037

Run 4:

➜  LFD401 ./clone
I am main, pid 11062
I am calling clone
I created child with pid: 11063
Main done, pid 11062
Main done, pid 11062
I am func, pid 11063

What is going on here? Why "I created child" message is sometimes printed several times?

Also I noticed that adding a delay after clone call "fixes" the problem.


Solution

  • You have a race condition (i.e.) you don't have the implied thread safety of stdio.

    The problem is even more severe. You can get duplicate "func" messages.

    The problem is that using clone does not have the same guarantees as pthread_create. (i.e.) You do not get the thread safe variants of printf.

    I don't know for sure, but, IMO the verbiage about stdio streams and thread safety, in practice, only applies when using pthreads.

    So, you'll have to handle your own interthread locking.

    Here is a version of your program recoded to use pthread_create. It seems to work without incident:

    #define _GNU_SOURCE
    
    #include <sched.h>
    #include <stdio.h>
    #include <sys/types.h>
    #include <unistd.h>
    #include <errno.h>
    #include <stdlib.h>
    #include <time.h>
    #include <pthread.h>
    
    #define STACK_SIZE 1024*1024
    
    void *func(void* param) {
        printf("I am func, pid %d\n", getpid());
        return (void *) 0;
    }
    
    int main(int argc, char const *argv[]) {
        printf("I am main, pid %d\n", getpid());
        void* ptr = malloc(STACK_SIZE);
    
        printf("I am calling clone\n");
    
        pthread_t tid;
        pthread_create(&tid,NULL,func,NULL);
        //int res = clone(func, ptr + STACK_SIZE, CLONE_VM, NULL);
        int res = 0;
    
        // works fine with sleep() call
        // sleep(1);
    
        if (res == -1) {
            printf("clone error: %d", errno);
        } else {
            printf("I created child with pid: %d\n", res);
        }
    
        pthread_join(tid,NULL);
        printf("Main done, pid %d\n", getpid());
        return 0;
    }
    

    Here is a test script I've been using to check for errors [it's a little rough, but should be okay]. Run against your version and it will abort quickly. The pthread_create version seems to pass just fine

    #!/usr/bin/perl
    # clonetest -- clone test
    #
    # arguments:
    #   "-p0" -- suppress check for duplicate parent messages
    #   "-c0" -- suppress check for duplicate child messages
    #   1 -- base name for program to test (e.g. for xyz.c, use xyz)
    #   2 -- [optional] number of test iterations (DEFAULT: 100000)
    
    master(@ARGV);
    exit(0);
    
    # master -- master control
    sub master
    {
        my(@argv) = @_;
        my($arg,$sym);
    
        while (1) {
            $arg = $argv[0];
            last unless (defined($arg));
    
            last unless ($arg =~ s/^-(.)//);
            $sym = $1;
    
            shift(@argv);
    
            $arg = 1
                if ($arg eq "");
    
            $arg += 0;
            ${"opt_$sym"} = $arg;
        }
    
        $opt_p //= 1;
        $opt_c //= 1;
        printf("clonetest: p=%d c=%d\n",$opt_p,$opt_c);
    
        $xfile = shift(@argv);
        $xfile //= "clone1";
        printf("clonetest: xfile='%s'\n",$xfile);
    
        $itermax = shift(@argv);
        $itermax //= 100000;
        $itermax += 0;
        printf("clonetest: itermax=%d\n",$itermax);
    
        system("cc -o $xfile -O2 $xfile.c -lpthread");
        $code = $? >> 8;
        die("master: compile error\n")
            if ($code);
    
        $logf = "/tmp/log";
    
        for ($iter = 1;  $iter <= $itermax;  ++$iter) {
            printf("iter: %d\n",$iter)
                if ($opt_v);
            dotest($iter);
        }
    }
    
    # dotest -- perform single test
    sub dotest
    {
        my($iter) = @_;
        my($parcnt,$cldcnt);
        my($xfsrc,$bf);
    
        system("./$xfile > $logf");
    
        open($xfsrc,"<$logf") or
            die("dotest: unable to open '$logf' -- $!\n");
    
        while ($bf = <$xfsrc>) {
            chomp($bf);
    
            if ($opt_p) {
                while ($bf =~ /created/g) {
                    ++$parcnt;
                }
            }
    
            if ($opt_c) {
                while ($bf =~ /func/g) {
                    ++$cldcnt;
                }
            }
        }
    
        close($xfsrc);
    
        if (($parcnt > 1) or ($cldcnt > 1)) {
            printf("dotest: fail on %d -- parcnt=%d cldcnt=%d\n",
                $iter,$parcnt,$cldcnt);
            system("cat $logf");
            exit(1);
        }
    }
    

    UPDATE:

    Were you able to recreate OPs problem with clone?

    Absolutely. Before I created the pthreads version, in addition to testing OP's original version, I also created versions that:

    (1) added setlinebuf to the start of main

    (2) added fflush just before the clone and __fpurge as the first statement of func

    (3) added an fflush in func before the return 0

    Version (2) eliminated the duplicate parent messages, but the duplicate child messages remained

    If you'd like to see this for yourself, download OP's version from the question, my version, and the test script. Then, run the test script on OP's version.

    I posted enough information and files so that anyone can recreate the problem.

    Note that due to differences between my system and OP's, I couldn't at first reproduce the problem on just 3-4 tries. So, that's why I created the script.

    The script does 100,000 test runs and usually the problem will manifest itself within 5000-15000.