clinuxfilelinux-kernelfwrite

C - Linux proc file syncing / write deferred


I have a simple watchdog mechanism made as follow:

time_t timer = time(NULL);
int n = 0;
while (true) {
    if (time(NULL) >= timer + 10) {
        timer = time(NULL);
        char szData[64];
        memset(szData, 0, 64);
        sprintf(szData, "s|%d|%s", tid, function_name);
        if ((n = strlen(szData)) > 0) {
            int t = 0;
            int fp = 0;
            while ((fp = open("/proc/counters", O_WRONLY | O_EXCL)) == -1 && errno == EACCESS) {
                if (++t > 4) {
                    break;
                }
                usleep(260000);
            }
            if (fp == -1) {
                return 1;
            }
            write(fp, szData, n);
            close(fp);
        }
    }
}

In this condition, each counter should reach a maximum value of 10, but my problem is that, sometimes, some of them reach a maximum value of 20, then 30, and so on. On the kernel module side, I see that indeed the reset command does not arrive in time and, on the next round, it receives two commands in one second, like the first has been delayed. Example:

Content of the /proc counters file

    *856 7 20/600 thread_function_name

Debug prints by the kernel module

Aug 26 11:16:38 XWEB-PRO kern.info kernel: [ 114.086453] register 856
Aug 26 11:16:38 XWEB-PRO kern.info kernel: [ 124.138523] register 856
Aug 26 11:16:58 XWEB-PRO kern.info kernel: [ 134.190508] register 856
Aug 26 11:16:58 XWEB-PRO kern.info kernel: [ 144.242274] register 856
Aug 26 11:16:58 XWEB-PRO kern.info kernel: [ 144.242277] register 856
Aug 26 11:17:08 XWEB-PRO kern.info kernel: [ 154.294433] register 856
Aug 26 11:17:28 XWEB-PRO kern.info kernel: [ 164.346516] register 856
Aug 26 11:17:28 XWEB-PRO kern.info kernel: [ 174.398552] register 856
Aug 26 11:17:38 XWEB-PRO kern.info kernel: [ 184.468022] register 856
Aug 26 11:17:48 XWEB-PRO kern.info kernel: [ 194.522241] register 856

As you can see, I missed the command at 11:16:48, but I have 3 at 11:16:58. Then I missed the one at 11:17:18, but have 2 at 11:17:28. I already tried fflush, fsync and sync, but with no luck. Anybody can point me in a right direction? Thank you


Solution

  • Finally I spotted the problem. Looks like it was a race condition issue. There are 10 threads that write at the same time on the /proc file, so, probably, some write operation were missed by the kernel module (indeed the reset always happened at multiple of 10 seconds). I surrounded the open/write/close sequence in the user space in a mutex condition and the problem seems gone.