I have some very complex application. It is composition of couple libraries.
Now QA team found the some problem (something reports an error).
Fromm logs I can see that application is leaking a file descriptors (+1000 after 7 hours of automated tests).
QA team has delivered rapport "opened files and ports" from "Activity monitor" and I know exactly to which server connection is not closed.
From full application logs I can see that leak is quite systematic (there is no sudden burst), but I was unable to reproduce issue to see even a small leak of file descriptors.
Even thou I'm sure for which server connection is never closed, I'm unable to find code responsible.
I'm unable reproduce issue.
In logs I can see that all resources my library maintains are properly freed, still server address suggest this is my responsibility or NSURLSession
(which is invalidated).
Since there are other libraries and application code it self there is small chance that leak is caused by third party code.
How to locate code responsible for leaking file descriptor?
Best candidate is use dtruss
which looks very promising.
From documentation I can see it can print stack backtraces -s
when system API is used.
Problem is that I do not know how to use this in such way that I will not get flooded with information.
I need only information who created opened file descriptor and if it was closed destroyed.
Since I can't reproduce issue I need a script which could be run by QA team so the could deliver me an output.
If there are other ways to find the source of file descriptor leak please let me know.
There is bunch of predefined scripts which are using dtruss
, but I don't see anything what is matching my needs.
What is strange the only code I'm aware is using problematic connection, do not use file descriptors directly, but uses custom NSURLSession
(configured as: one connection per host, minimum TLS 1.0, disable cookies, custom certificate validation). From logs I can see NSURLSession
is invalidated properly. I doubt NSURLSession
is source of leak, but currently this is the only candidate.
OK, I found out how to do it - on Solaris 11, anyway. I get this output (and yes, I needed root
on Solaris 11):
bash-4.1# dtrace -s fdleaks.d -c ./fdLeaker
open( './fdLeaker' ) returned 3
open( './fdLeaker' ) returned 4
open( './fdLeaker' ) returned 5
falloc fp: ffffa1003ae56590, fd: 3, saved fd: 3
falloc fp: ffffa10139d28f58, fd: 4, saved fd: 4
falloc fp: ffffa10030a86df0, fd: 5, saved fd: 5
opened file: ./fdLeaker
leaked fd: 3
libc.so.1`__systemcall+0x6
libc.so.1`__open+0x29
libc.so.1`open+0x84
fdLeaker`main+0x2b
fdLeaker`_start+0x72
opened file: ./fdLeaker
leaked fd: 4
libc.so.1`__systemcall+0x6
libc.so.1`__open+0x29
libc.so.1`open+0x84
fdLeaker`main+0x64
fdLeaker`_start+0x72
The fdleaks.d
dTrace script that finds leaked file descriptors:
#!/usr/sbin/dtrace
/* this will probably need tuning
note there can be significant performance
impacts if you make these large */
#pragma D option nspec=4
#pragma D option specsize=128k
#pragma D option quiet
syscall::open*:entry
/ pid == $target /
{
/* arg1 might not have a physical mapping yet so
we can't call copyinstr() until open() returns
and we don't have a file descriptor yet -
we won't get that until open() returns anyway */
self->path = arg1;
}
/* arg0 is the file descriptor being returned */
syscall::open*:return
/ pid == $target && arg0 >= 0 && self->path /
{
/* get a speculation ID tied to this
file descriptor and start speculative
tracing */
openspec[ arg0 ] = speculation();
speculate( openspec[ arg0 ] );
/* this output won't appear unless the associated
speculation id is commited */
printf( "\nopened file: %s\n", copyinstr( self->path ) );
printf( "leaked fd: %d\n\n", arg0 );
ustack();
/* free the saved path */
self->path = 0;
}
syscall::close:entry
/ pid == $target && arg0 >= 0 /
{
/* closing the fd, so discard the speculation
and free the id by setting it to zero */
discard( openspec[ arg0 ] );
openspec[ arg0 ] = 0;
}
/* Solaris uses falloc() to open a file and associate
the fd with an internal file_t structure
When the kernel closes file descriptors that the
process left open, it uses the closeall() function
which walks the internal structures then calls
closef() using the file_t *, so there's no way
to get the original process file descritor in
closeall() or closef() dTrace probes.
falloc() is called on open() to associate the
file_t * with a file descriptor, so this
saves the pointers passed to falloc()
that are used to return the file_t * and
file descriptor once they're filled in
when falloc() returns */
fbt::falloc:entry
/ pid == $target /
{
self->fpp = args[ 2 ];
self->fdp = args[ 3 ];
}
/* Clause-local variables to make casting clearer */
this int fd;
this uint64_t fp;
/* array to associate a file descriptor with its file_t *
structure in the kernel */
int fdArray[ uint64_t fp ];
fbt::falloc:return
/ pid == $target && self->fpp && self->fdp /
{
/* get the fd and file_t * values being
returned to the caller */
this->fd = ( * ( int * ) self->fdp );
this->fp = ( * ( uint64_t * ) self->fpp );
/* associate the fd with its file_t * */
fdArray[ this->fp ] = ( int ) this->fd;
/* verification output */
printf( "falloc fp: %x, fd: %d, saved fd: %d\n", this->fp, this->fd, fdArray[ this->fp ] );
}
/* if this gets called and the dereferenced
openspec array element is a still-valid
speculation id, the fd associated with
the file_t * passed to closef() was never
closed by the process itself */
fbt::closef:entry
/ pid == $target /
{
/* commit the speculative tracing since
this file descriptor was leaked */
commit( openspec[ fdArray[ arg0 ] ] );
}
First, I wrote this little C program to leak fds:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
int main( int argc, char **argv )
{
int ii;
for ( ii = 0; ii < argc; ii++ )
{
int fd = open( argv[ ii ], O_RDONLY );
fprintf( stderr, "open( '%s' ) returned %d\n", argv[ ii ], fd );
fd = open( argv[ ii ], O_RDONLY );
fprintf( stderr, "open( '%s' ) returned %d\n", argv[ ii ], fd );
fd = open( argv[ ii ], O_RDONLY );
fprintf( stderr, "open( '%s' ) returned %d\n", argv[ ii ], fd );
close( fd );
}
return( 0 );
}
Then I ran it under this dTrace script to figure out what the kernel does to close orphaned file descriptors, dtrace -s exit.d -c ./fdLeaker
:
#!/usr/sbin/dtrace -s
#pragma D option quiet
syscall::rexit:entry
{
self->exit = 1;
}
syscall::rexit:return
/ self->exit /
{
self->exit = 0;
}
fbt:::entry
/ self->exit /
{
printf( "---> %s\n", probefunc );
}
fbt:::return
/ self->exit /
{
printf( "<--- %s\n", probefunc );
}
That produced a lot of output, and I noticed closeall()
and closef()
functions, examined the source code, and wrote the dTrace script.
Note also that the process exit dTrace probe on Solaris 11 is the rexit
one - that probably changes on OSX.
The biggest problem on Solaris is getting the file descriptor for the file in the kernel code that closes orphaned file descriptors. Solaris doesn't close by file descriptor, it closes by struct file_t
pointers in the kernel open files structures for the process. So I had to examine the Solaris source to figure out where the fd is associated with the file_t *
- and that's in the falloc()
function. The dTrace script associates a file_t *
with its fd in an associative array.
None of that is likely to work on OSX.
If you're lucky, the OSX kernel will close orphaned file descriptors by the file descriptor itself, or at least provide something that tells you the fd is being closed, perhaps an auditing function.