valgrindmemcheckmkstemp

What causes mkstemp to fail when running many simultaneous valgrind processes?


I'm doing testing of some software with valgrind. Ideally, I would like to have 20 or more instances of valgrind open at once. However, if I run more than 16 instances in parallel, I start getting messages like:

==30533== VG_(mkstemp): failed to create temp file: /tmp/valgrind_proc_30533_cmdline_269e37a6
==30533== VG_(mkstemp): failed to create temp file: /tmp/valgrind_proc_30533_cmdline_d6b675e7
==30533== VG_(mkstemp): failed to create temp file: /tmp/valgrind_proc_30533_cmdline_db46c594
==30533== VG_(mkstemp): failed to create temp file: /tmp/valgrind_proc_30533_cmdline_51cd683d
==30533== VG_(mkstemp): failed to create temp file: /tmp/valgrind_proc_30533_cmdline_86662832
==30533== VG_(mkstemp): failed to create temp file: /tmp/valgrind_proc_30533_cmdline_226a8983
==30533== VG_(mkstemp): failed to create temp file: /tmp/valgrind_proc_30533_cmdline_bb94a700
==30533== VG_(mkstemp): failed to create temp file: /tmp/valgrind_proc_30533_cmdline_532d4b39
==30533== VG_(mkstemp): failed to create temp file: /tmp/valgrind_proc_30533_cmdline_de4a957e
==30533== VG_(mkstemp): failed to create temp file: /tmp/valgrind_proc_30533_cmdline_fcc23adf
==30533== VG_(mkstemp): failed to create temp file: /tmp/valgrind_proc_30533_cmdline_f41d332c
valgrind: Startup or configuration error:
valgrind:    Can't create client cmdline file in /pathtomyproject/
valgrind: Unable to start up properly.  Giving up.

Some of the processes (perhaps 1/3 of them) instead terminate with the error

==30482== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)
==30482==
==30482== 1 errors in context 1 of 1:
==30482== Jump to the invalid address stated on the next line
==30482==    at 0x4C6: ???
==30482==    by 0x4005D2E: open_verify (dl-load.c:1914)
==30482==    by 0x4006362: open_path (dl-load.c:2175)
==30482==    by 0x4008799: _dl_map_object (dl-load.c:2407)
==30482==    by 0x400CFE1: openaux (dl-deps.c:65)
==30482==    by 0x400F175: _dl_catch_error (dl-error.c:178)
==30482==    by 0x400D6BD: _dl_map_object_deps (dl-deps.c:258)
==30482==    by 0x400350C: dl_main (rtld.c:1826)
==30482==    by 0x4015B23: _dl_sysdep_start (dl-sysdep.c:244)
==30482==    by 0x4005364: _dl_start (rtld.c:338)
==30482==    by 0x40016B7: ??? (in /lib/x86_64-linux-gnu/ld-2.15.so)
==30482==    by 0x4: ???
==30482==    by 0x7FF0007C6: ???
==30482==    by 0x7FF0007DD: ???
==30482==    by 0x7FF0007E2: ???
==30482==    by 0x7FF0007E9: ???
==30482==    by 0x7FF0007EE: ???
==30482==  Address 0x4c6 is not stack'd, malloc'd or (recently) free'd

While running these calls, no files are created in /tmp, but the user account I'm using does have read, write and execute permissions for /tmp.

I cannot find any information about this bug online, but perhaps somewhere here knows something about it?

EDIT: Some further experimentation suggests that in fact, no more than 5 processes can be run together at once.


Solution

  • The error comes from here:

    // coregrind/m_libcfile.c
    
    /* Create and open (-rw------) a tmp file name incorporating said arg.
       Returns -1 on failure, else the fd of the file.  If fullname is
       non-NULL, the file's name is written into it.  The number of bytes
       written is guaranteed not to exceed 64+strlen(part_of_name). */
    
    Int VG_(mkstemp) ( HChar* part_of_name, /*OUT*/HChar* fullname )
    {
       HChar  buf[200];
       Int    n, tries, fd;
       UInt   seed;
       SysRes sres;
       const HChar *tmpdir;
    
       vg_assert(part_of_name);
       n = VG_(strlen)(part_of_name);
       vg_assert(n > 0 && n < 100);
    
       seed = (VG_(getpid)() << 9) ^ VG_(getppid)();
    
       /* Determine sensible location for temporary files */
       tmpdir = VG_(tmpdir)();
    
       tries = 0;
       while (True) {
          if (tries++ > 10)
             return -1;
          VG_(sprintf)( buf, "%s/valgrind_%s_%08x",
                        tmpdir, part_of_name, VG_(random)( &seed ));
          if (0)
             VG_(printf)("VG_(mkstemp): trying: %s\n", buf);
    
          sres = VG_(open)(buf,
                           VKI_O_CREAT|VKI_O_RDWR|VKI_O_EXCL|VKI_O_TRUNC,
                           VKI_S_IRUSR|VKI_S_IWUSR);
          if (sr_isError(sres)) {
             VG_(umsg)("VG_(mkstemp): failed to create temp file: %s\n", buf);
             continue;
          }
          /* VG_(safe_fd) doesn't return if it fails. */
          fd = VG_(safe_fd)( sr_Res(sres) );
          if (fullname)
             VG_(strcpy)( fullname, buf );
          return fd;
       }
       /* NOTREACHED */
    }
    

    As you can see, that code will fail if there are more than 10 processes that share the same pid and ppid. It is not clear how you are creating the 20 valgrind processes -- they should normally not share pid.

    You might be able to work around the problem by either