I have this logger.py file:
import logging
from logging.handlers import TimedRotatingFileHandler
FORMAT = '%(asctime)s %(message)s'
log = logging.getLogger(__name__)
log.setLevel(logging.INFO)
handler = TimedRotatingFileHandler('/mnt/NFS-drive/debug.log', when='midnight', backupCount=7)
handler.setFormatter(logging.Formatter(FORMAT))
log.addHandler(handler)
I use it in this countdown module:
import time
from logger import log
def countdown(minutes=10, callback=None):
seconds = minutes * 60
# Run the countdown loop
while seconds >= 0:
# Calculate remaining minutes and seconds
remaining_minutes = seconds // 60
remaining_seconds = seconds % 60
# Print the remaining minutes and seconds
log.info("Time remaining: {:02d}:{:02d}".format(remaining_minutes, remaining_seconds))
# Sleep for 1 second
time.sleep(1)
# Decrement the seconds
seconds -= 1
if callback:
log.info("Timer finished...")
callback()
Then I run a multithreaded script which writes logs to debug.log
. This script runs on multiple machines as well. They all write logs to the same NFS mounted file.
I'm see two issues
Error:
--- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.10/logging/__init__.py", line 1104, in emit
self.flush()
File "/usr/lib/python3.10/logging/__init__.py", line 1084, in flush
self.stream.flush()
OSError: [Errno 116] Stale file handle
Call stack:
File "/mnt/nf/myscripts/run.py", line 128, in run
countdown(minutes=5, callback=lambda: log.info("Checking queue for new messages"))
File "/mnt/nf/myscripts/countdown.py", line 14, in countdown
log.info("Time remaining: {:02d}:{:02d}".format(remaining_minutes, remaining_seconds))
Message: 'Time remaining: 03:25'
Arguments: ()
And I'm seeing all these .nfs files in the directory:
Any idea why i'm suddenly seeing this error and getting this files?
Don't log concurrently to the same file from multiple processes (unless you have a logging service to handle synchronization). Especially on NFS:
O_APPEND may lead to corrupted files on NFS filesystems if more than one process appends data to a file at once. This is because NFS does not support appending to a file, so the client kernel has to simulate it, which can't be done without a race condition.
One such race condition is where file name lookup already took place, then the resulting NFS file handle is transmitted back to the client, the corresponding inode is deleted or renamed when logs are rotated, and then the NFS client tries to open that inode by its now-stale NFS file handle.
As for those .nfs files - that is NFS silly rename - see here for details. I assume multiple instances of the script are running on the same machine, such when one has the file open, the other rotates the log.