clinuxmultithreadingfile-locking

linux fcntl file lock with timeout


the standard linux fcntl call doesn't provide a timeout option. I'm considering implement a timeout lock with signal.

Here is the description of blocking lock:


F_SETLKW

This command shall be equivalent to F_SETLK except that if a shared or exclusive lock is blocked by other locks, the thread shall wait until the request can be satisfied. If a signal that is to be caught is received while fcntl() is waiting for a region, fcntl() shall be interrupted. Upon return from the signal handler, fcntl() shall return -1 with errno set to [EINTR], and the lock operation shall not be done.


So what kind of signal I need to use to indicate the lock to be interrupted? And since there're multiple threads running in my process, I only want to interrupt this IO thread who is blokcing for the file lock, other threads should not be affected, but signal is process-level, I'm not sure how to handle this situation.

Added:

I've written a simple implemnetation using signal.

int main(int argc, char **argv) {
  std::string lock_path = "a.lck";

  int fd = open(lock_path.c_str(), O_CREAT | O_RDWR, S_IRWXU | S_IRWXG | S_IRWXO);

  if (argc > 1) {
    signal(SIGALRM, [](int sig) {});
    std::thread([](pthread_t tid, unsigned int seconds) {
      sleep(seconds);
      pthread_kill(tid, SIGALRM);
    }, pthread_self(), 3).detach();
    int ret = file_rwlock(fd, F_SETLKW, F_WRLCK);

    if (ret == -1) std::cout << "FAIL to acquire lock after waiting 3s!" << std::endl;

  } else {
    file_rwlock(fd, F_SETLKW, F_WRLCK);
    while (1);
  }

  return 0;
}

by running ./main followed by ./main a, I expect the first process holding the lock forever, and second process try to get the lock and interrupted after 3s, but the second process just terminated.

Could anyone tell me what's wrong with my code?


Solution

  • So what kind of signal I need to use to indicate the lock to be interrupted?

    The most obvious choice of signal would be SIGUSR1 or SIGUSR2. These are provided to serve user-defined purposes.

    There is also SIGALRM, which would be natural if you're using a timer that produces such a signal to do your timekeeping, and which makes some sense even to generate programmatically, as long as you are not using it for other purposes.

    And since there're multiple threads running in my process, I only want to interrupt this IO thread who is blokcing for the file lock, other threads should not be affected, but signal is process-level, I'm not sure how to handle this situation.

    You can deliver a signal to a chosen thread in a multithreaded process via the pthread_kill() function. This also stands up well to the case where more than one thread is waiting on a lock at the same time.

    With regular kill(), you also have the alternative of making all threads block the chosen signal (sigprocmask()), and then having the thread making the lock attempt unblock it immediately prior. When the chosen signal is delivered to the process, a thread that is not presently blocking it will receive it, if any such thread is available.

    Example implementation

    This supposes that a signal handler has already been set up to handle the chosen signal (it doesn't need to do anything), and that the signal number to use is available via the symbol LOCK_TIMER_SIGNAL. It provides the wanted timeout behavior as a wrapper function around fcntl(), with command F_SETLKW as described in the question.

    #define _POSIX_C_SOURCE 200809L
    #define _GNU_SOURCE
    
    #include <unistd.h>
    #include <signal.h>
    #include <time.h>
    #include <fcntl.h>
    #include <sys/types.h>
    #include <sys/syscall.h>
    
    #if (__GLIBC__ < 2) || (__GLIBC__ == 2 && __GLIBC_MINOR__ < 30)
    // glibc prior to 2.30 does not provide a wrapper 
    // function for this syscall:    
    static pid_t gettid(void) {
        return syscall(SYS_gettid);
    }
    #endif
    
    /**
     * Attempt to acquire an fcntl() lock, with timeout
     *
     * fd: an open file descriptor identifying the file to lock
     * lock_info: a pointer to a struct flock describing the wanted lock operation
     * to_secs: a time_t representing the amount of time to wait before timing out
     */    
    int try_lock(int fd, struct flock *lock_info, time_t to_secs) {
        int result;
        timer_t timer;
    
        result = timer_create(CLOCK_MONOTONIC,
                & (struct sigevent) {
                    .sigev_notify = SIGEV_THREAD_ID,
                    ._sigev_un = { ._tid = gettid() },
                    // note: gettid() conceivably can fail
                    .sigev_signo = LOCK_TIMER_SIGNAL },
                &timer);
        // detect and handle errors ...
    
        result = timer_settime(timer, 0,
                & (struct itimerspec) { .it_value = { .tv_sec = to_secs } },
                NULL);
    
        result = fcntl(fd, F_SETLKW, lock_info);
        // detect and handle errors (other than EINTR) ...
        // on EINTR, may want to check that the timer in fact expired
    
        result = timer_delete(timer);
        // detect and handle errors ...
    
        return result;
    }
    

    That works as expected for me.

    Notes: