[SOLVED] Logging causing OSError and .nfs files

Logging causing OSError and .nfs files

I have this logger.py file:

import logging
from logging.handlers import TimedRotatingFileHandler

FORMAT = '%(asctime)s %(message)s'
log = logging.getLogger(__name__)
log.setLevel(logging.INFO)
handler = TimedRotatingFileHandler('/mnt/NFS-drive/debug.log', when='midnight', backupCount=7)
handler.setFormatter(logging.Formatter(FORMAT))
log.addHandler(handler)

I use it in this countdown module:

import time
from logger import log

def countdown(minutes=10, callback=None):
    seconds = minutes * 60

    # Run the countdown loop
    while seconds >= 0:
        # Calculate remaining minutes and seconds
        remaining_minutes = seconds // 60
        remaining_seconds = seconds % 60

        # Print the remaining minutes and seconds
        log.info("Time remaining: {:02d}:{:02d}".format(remaining_minutes, remaining_seconds))

        # Sleep for 1 second
        time.sleep(1)

        # Decrement the seconds
        seconds -= 1

    if callback:
        log.info("Timer finished...")
        callback()

Then I run a multithreaded script which writes logs to debug.log. This script runs on multiple machines as well. They all write logs to the same NFS mounted file.

I'm see two issues

Error:

--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.10/logging/__init__.py", line 1104, in emit
    self.flush()
  File "/usr/lib/python3.10/logging/__init__.py", line 1084, in flush
    self.stream.flush()
OSError: [Errno 116] Stale file handle
Call stack:
  File "/mnt/nf/myscripts/run.py", line 128, in run
    countdown(minutes=5, callback=lambda: log.info("Checking queue for new messages"))
  File "/mnt/nf/myscripts/countdown.py", line 14, in countdown
    log.info("Time remaining: {:02d}:{:02d}".format(remaining_minutes, remaining_seconds))
Message: 'Time remaining: 03:25'
Arguments: ()

And I'm seeing all these .nfs files in the directory:

Any idea why i'm suddenly seeing this error and getting this files?

Solution

Don't log concurrently to the same file from multiple processes (unless you have a logging service to handle synchronization). Especially on NFS:

O_APPEND may lead to corrupted files on NFS filesystems if more than one process appends data to a file at once. This is because NFS does not support appending to a file, so the client kernel has to simulate it, which can't be done without a race condition.

One such race condition is where file name lookup already took place, then the resulting NFS file handle is transmitted back to the client, the corresponding inode is deleted or renamed when logs are rotated, and then the NFS client tries to open that inode by its now-stale NFS file handle.

As for those .nfs files - that is NFS silly rename - see here for details. I assume multiple instances of the script are running on the same machine, such when one has the file open, the other rotates the log.