linuxbashfile-descriptorlsof

Who is writing to my file or why lsof did not display the writer but reader do


I did the following actions on two separate terminals:

1s term:

Start process that is writing to file in background:

└──> while true; do date >> log; sleep 1; done &
[1] 20604

Get the PID of last process that is running in background:

└──> echo $!
20604

2nd term:

Display the content of file being written:

└──> tail -f log
Thu May  7 18:48:20 CEST 2015
Thu May  7 18:48:21 CEST 2015
Thu May  7 18:48:22 CEST 2015
Thu May  7 18:48:23 CEST 2015
Thu May  7 18:48:24 CEST 2015
Thu May  7 18:48:25 CEST 2015
Thu May  7 18:48:26 CEST 2015
Thu May  7 18:48:27 CEST 2015

1st term:

Check who is accessing the file (note that there is only reader)

└──> lsof log
COMMAND   PID  USER      FD   TYPE DEVICE SIZE/OFF   NODE NAME
tail    21038  wakatana   3r   REG    8,1     5340 797966 log

After Following kill the tail -f on 2nd terminal terminated and lsof returned empty output:

└──> kill 21038
└──> lsof log
└──>

2nd term:

Then I started tail -f again and I saw that data was still written to log file. This means that some process is still writing to log file:

└──> tail -f log
Thu May  7 18:52:33 CEST 2015
Thu May  7 18:52:34 CEST 2015
Thu May  7 18:52:35 CEST 2015
Thu May  7 18:52:36 CEST 2015
Thu May  7 18:52:37 CEST 2015
Thu May  7 18:52:38 CEST 2015
Thu May  7 18:52:39 CEST 2015
Thu May  7 18:52:40 CEST 2015

In this case I actually know the mysterious PID of process which is writing to file, it is PID 20604 so I can kill it and log file will stop growing.

My questions are:

  1. Why did not lsof display (even if is issued repeatedly) the process that is actually writing to log file? I understand that 20604 belongs to bash and it is not bash that is writing to file directly but it's child date. But the lsof did not display neither bash nor date.
  2. What if I did not know the PID 20604? How can I track the writing process then?

PS: The shell used: GNU bash, version 4.2.37(1)-release (x86_64-pc-linux-gnu)


Solution

  • You have a classic engineering problem in the form of asynchronous sampling here.

    Essentially, every long period of waiting time, a process will be very quickly spun up, write to a file, and then die.

    Entirely asynchronous to that, you run lsof which looks for open files - but only effectively at one instant in time that probably will not match when the file is being written. (In actuality, lsof performs a multi-step operation, but there's probably only one-shot opportunity to catch any given writer).

    It might be tempting to think that if you ran lsof enough times in a loop, you would eventually catch the writer in the act - and perhaps you will. However, depending on how your system's scheduler and I/O functionality works, it is possible that the writing process might be so brief as that there is never any chance for another process to run during it.

    If you want a version of this which you can catch in the act, continue to have the generation spaced in time within a parenthesized subshell, but make the writing one consistent operation:

    (while true; do date ; sleep 1; done) > log &
    

    Or, if you want to try to catch brief events, you might look at the inotify mechanism (you can view its documentation with man inotify) Bear in mind that there is no identification of the actor, and when the actor is short lived like this you can't then go do an lsof type search to figure out who it was.