I did the following actions on two separate terminals:
1s term:
Start process that is writing to file in background:
└──> while true; do date >> log; sleep 1; done &
[1] 20604
Get the PID of last process that is running in background:
└──> echo $!
20604
2nd term:
Display the content of file being written:
└──> tail -f log
Thu May 7 18:48:20 CEST 2015
Thu May 7 18:48:21 CEST 2015
Thu May 7 18:48:22 CEST 2015
Thu May 7 18:48:23 CEST 2015
Thu May 7 18:48:24 CEST 2015
Thu May 7 18:48:25 CEST 2015
Thu May 7 18:48:26 CEST 2015
Thu May 7 18:48:27 CEST 2015
1st term:
Check who is accessing the file (note that there is only reader)
└──> lsof log
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
tail 21038 wakatana 3r REG 8,1 5340 797966 log
After Following kill
the tail -f
on 2nd terminal terminated and lsof
returned empty output:
└──> kill 21038
└──> lsof log
└──>
2nd term:
Then I started tail -f
again and I saw that data was still written to log file. This means that some process is still writing to log file:
└──> tail -f log
Thu May 7 18:52:33 CEST 2015
Thu May 7 18:52:34 CEST 2015
Thu May 7 18:52:35 CEST 2015
Thu May 7 18:52:36 CEST 2015
Thu May 7 18:52:37 CEST 2015
Thu May 7 18:52:38 CEST 2015
Thu May 7 18:52:39 CEST 2015
Thu May 7 18:52:40 CEST 2015
In this case I actually know the mysterious PID of process which is writing to file, it is PID 20604 so I can kill it and log file will stop growing.
My questions are:
lsof
display (even if is issued repeatedly) the process that is actually writing to
log file? I understand that 20604 belongs to bash
and it is not bash that is writing to file directly but it's child date
. But the lsof
did not display neither bash
nor date
.PS: The shell used: GNU bash, version 4.2.37(1)-release (x86_64-pc-linux-gnu)
You have a classic engineering problem in the form of asynchronous sampling here.
Essentially, every long period of waiting time, a process will be very quickly spun up, write to a file, and then die.
Entirely asynchronous to that, you run lsof
which looks for open files - but only effectively at one instant in time that probably will not match when the file is being written. (In actuality, lsof
performs a multi-step operation, but there's probably only one-shot opportunity to catch any given writer).
It might be tempting to think that if you ran lsof
enough times in a loop, you would eventually catch the writer in the act - and perhaps you will. However, depending on how your system's scheduler and I/O functionality works, it is possible that the writing process might be so brief as that there is never any chance for another process to run during it.
If you want a version of this which you can catch in the act, continue to have the generation spaced in time within a parenthesized subshell, but make the writing one consistent operation:
(while true; do date ; sleep 1; done) > log &
Or, if you want to try to catch brief events, you might look at the inotify
mechanism (you can view its documentation with man inotify
) Bear in mind that there is no identification of the actor, and when the actor is short lived like this you can't then go do an lsof
type search to figure out who it was.