linux-kernelsystem-callsprocfssysctl

O_TRUNC ignored when writing to the /proc filesystem


Trying to get rid of Ubuntu's apport by clearing /proc/sys/kernel/core_pattern using
sh -c ': > /proc/sys/kernel/core_pattern' does not work.

It looks like the O_TRUNC flag is ignored when writing to the /proc filesystem:

echo nonsense >| /proc/sys/kernel/core_pattern
strace sh -c ': > /proc/sys/kernel/core_pattern  # do not call apport'
...
openat(AT_FDCWD, "/proc/sys/kernel/core_pattern", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
...
close(3)                                = 0

cat /proc/sys/kernel/core_pattern
nonsense

I get an empty file when doing this in regular filesystems instead of the /proc filesystem.

Is that a kernel bug or a feature, perhaps even a documented one?

Edit: Clearing this setting through sysctl does not work:

sysctl kernel.core_pattern=""
sysctl: malformed setting "kernel.core_pattern="

It seems the sysctl program is unable to clear any kernel parameters while man core explicitly describes that an empty value is used to disable the mechanism.

Yes, echo >| /proc/sys/kernel/core_pattern works instead, but the object of this question is to find out whether this is a kernel bug, not to find a workaround.


Solution

  • This is because the O_TRUNC flag of the open* family of syscalls merely updates the size of the inode associated with the opened file. This change is performed right after finding the inode and before finalizing the struct file that is then used by the kernel for any actual operation on the opened file. Truncation is performed before the call to any ->open() file_operations handler implemented by whichever kernel module/driver/subsystem (like for example the sysctl subsystem) and is therefore transparent to the handler.

    In other words, the file_operations handlers of the virtual sysctl files (e.g. /proc/sys/kernel/*) merely see a file with a 0 (zero) size (->i_size field of struct inode), they do not know whether this was the result of a truncation or a "normal" open, nor they should need such information.

    Since sysctl files (just as the near totality of procfs files) do not really bother tracking sizes for understandable reasons, their functionality is only implemented in terms of read and write system calls (which also do not update the size in any way).

    Indeed, using : > PATH will merely do open + close, while a simple echo > PATH will write a newline character after opening, thus you observe two different outcomes. You would observe the same behavior as : > PATH using truncate -s 0 PATH, though this time the truncation is done explicitly after opening through ftruncate (at least on my system).

    man core explicitly describes that an empty value is used to disable the mechanism

    [...]

    Is that a kernel bug or a feature, perhaps even a documented one?

    Human readable/writable files under procfs are usually designed to work in a line-oriented fashion, so I would assume that the term "empty" here simply means that the value of the option is empty as a result of writing an empty line to the file. If anything, I would call this an undocumented feature rather than a bug.


    Here's some example traces on my system:

    root@xxx:~# cat /proc/sys/kernel/core_pattern
    |/usr/share/apport/apport %p %s %c %d %P %E
    root@xxx:~# strace -f -e openat,write,close,dup2 sh -c ': > /proc/sys/kernel/core_pattern'
    ...
    openat(AT_FDCWD, "/proc/sys/kernel/core_pattern", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
    close(1)                                = 0
    dup2(3, 1)                              = 1
    close(3)                                = 0
    ...
    +++ exited with 0 +++
    root@xxx:~# cat /proc/sys/kernel/core_pattern
    |/usr/share/apport/apport %p %s %c %d %P %E
    root@xxx:~#
    
    root@xxx:~# cat /proc/sys/kernel/core_pattern
    |/usr/share/apport/apport %p %s %c %d %P %E
    root@xxx:~# strace -f -e openat,write,close,dup2 sh -c 'echo > /proc/sys/kernel/core_pattern'
    ...
    openat(AT_FDCWD, "/proc/sys/kernel/core_pattern", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
    close(1)                                = 0
    dup2(3, 1)                              = 1
    close(3)                                = 0
    write(1, "\n", 1)                       = 1
    ...
    +++ exited with 0 +++
    root@xxx:~# cat /proc/sys/kernel/core_pattern
    
    root@xxx:~#
    

    If you want to take a look at the actual implementation of open/read/write/close for sysfs files you can check /kernel/sysctl.c. There are different tables present for different sysctl facilities e.g. kernel, vm, etc.