clinuxglibcrhel7

getifaddrs returning 'bad file descriptor'/crashing the application


In my program, I have a thread which has to continuously monitor the network interfaces therefore it continuosly uses getifaddrs() in a while loop.

    while(true) {
    
        struct ifaddrs *ifaddr, *ifa;
        if (getifaddrs(&ifaddr) == -1) {
            perror("getifaddrs couldn't fetch required data");
            exit(EXIT_FAILURE);
        }
  
        //Iterate through interfaces linked list
        for (ifa = ifaddr; ifa != NULL; ifa = ifa->ifa_next) {
        //monitoring logic
        }

       //Free linked list
       freeifaddrs(ifaddr);

       //Sleep for specified time fo next polling cycle
       usleep(1000);
    
    }

Most of the time my program works fine. However, sometimes getifaddrs() returns -1 and errNo = EBADF(bad file descriptor). In order to not exit my thread, I have temporarily replaced exit with continue(as I don't want my program to end due to this). However, I'm curious to know in which cases can getifaddrs() return 'bad file descriptor' error and whether I can do something so that this does not happen?

EDIT

replacing 'exit' with 'continue' didn't solve my problem. Sometimes the call to getifaddrs() is crashing the application!

Given below is the backtrace obtained from gdb using the generated core file.

Program terminated with signal 6, Aborted.
#0  0x00007fe2df1ef387 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-307.el7.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libstdc++-4.8.5-39.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  0x00007fe2df1ef387 in raise () from /lib64/libc.so.6
#1  0x00007fe2df1f0a78 in abort () from /lib64/libc.so.6
#2  0x00007fe2df231ed7 in __libc_message () from /lib64/libc.so.6
#3  0x00007fe2df231fbe in __libc_fatal () from /lib64/libc.so.6
#4  0x00007fe2df2df4c2 in __netlink_assert_response () from /lib64/libc.so.6
#5  0x00007fe2df2dc412 in __netlink_request () from /lib64/libc.so.6
#6  0x00007fe2df2dc5ef in getifaddrs_internal () from /lib64/libc.so.6
#7  0x00007fe2df2dd310 in getifaddrs () from /lib64/libc.so.6
#8  0x000000000047c03c in __interceptor_getifaddrs.part.0 ()

Operating system: Red Hat Enterprise Linux Server release 7.8 (Maipo)

GLIBC version: 2.17


Solution

  • https://patchwork.ozlabs.org/project/netdev/patch/5638B93F.3090202@redhat.com/

    in the link it says the reason of crash is. "The recvmsg system calls for netlink sockets have been particularly prone to picking up unrelated data after a file descriptor race (where the descriptor is closed and reopened concurrently in a multi-threaded process, as the result of a file descriptor management issue elsewhere).".

    So i think you need either dont use a seperate thread or use some locking mechanism around netlink functions.

    At least just confirm that it still crash or not when you monitor the network interfaces in main thread.