clinuxx86-64sigbus

Can I rule out that SIGBUS is raised by a "minor page fault"? (Kernel log has no allocation failure)


Motivation

I am trying to improve my understanding of a SIGBUS error in Xwayland. This has been seen by several Fedora Linux users since around the 20th of February 2018, with Xwayland 1.19.6-5.fc27.x86_64 and Linux kernel 4.15.3-300.fc27.x86-64.

Sadly I do not have the kernel "segfault" log message (or equivalent for SIGBUS). Xwayland has some pointless code which traps the fatal signal. But I can see siginfo by debugging the coredump, and this seems to be nearly as good.

Definition

I understand that a "major page fault" occurs when a page of virtual memory is not available in RAM, and must be read from disk. I think I'm specifically interested in pages backed by a ext4 filesystem (e.g. no direct access to block devices) for this question.

Therefore a "minor page fault" is when no disk access is necessary. I assume the difference is fairly well-defined because Linux exposes counters for major and minor page faults.

My question

If the kernel sends a program SIGBUS, I wonder if I should generally expect that this would have been a major page fault.

According to the coredump and disassembly, the program is reading memory when it receives SIGBUS, not writing it. The fault address in siginfo->si_addr is within a mapped system executable, which is not writeable by the user, and the address seems within the bounds of the current file length. In fact when debugging the coredump, I have read very convincing values from the memory address. It seems the coredump generation process had no difficulty reading this address :-(.

I'm also confident in ruling out the "invalid address alignment" case (BUS_ADRALN), because siginfo->si_code is 2, i.e. BUS_ADRERR, "non-existent physical address". Also because I'm on x86, which permits unaligned accesses in most cases, and the trap isn't in any SSE extended instruction.

I considered what the kernel is normally responsible for, when it handles a page fault which it determines is "minor". I suppose minor faults could fail to allocate memory, and hence raise SIGBUS. However, I believe I would have noticed such an allocation failure:

I have plenty of free swap to evict user pages to, and I did not notice the usual obvious slowdown that occurs when my system starts swapping. The crash happened a few seconds after waking a laptop from suspend to ram, which would not have been long enough to fill 8GB of swap even at ~100MB/s. Nor did I see the dread Out Of Memory (OOM) killer appear in kernel logs, as I would expect if the kernel failed allocating a page frame or page table.

Is there some other possibility that a minor page fault could have failed and caused the SIGBUS? I.e. is there some cause which I would not have noticed, when looking for errors in the kernel log? And which could have a quick onset?

Again, multiple coredumps are showing this as a page fault triggered by reading from a mapped file on the filesystem.

Ulterior motivation

I would really like to have missed a case for minor page faults. Because the horrifying flipside of this is that I don't see how this SIGBUS could have been cause by the hard page fault side of things. Several of us users have very similar-looking errors, starting a few months ago. There is no IO error in my kernel logs. During normal operation, I have no IO errors when reading the indicated file. I have no errors when running rpm --verify --all, or when running an extended SMART test on the HDD. Unfortunately I seem to have very few suspects. The closest suspect I have is a kernel upgrade, which I would obviously prefer to rule out; the dates don't exactly prove it but it's not entirely ruled out. Next closest in the dates is this years microcode update; this seems like it would be even harder to nail down.

Known causes of minor page faults

  1. Logically, it sounds like minor page faults occur when implementing copy-on-write for MAP_PRIVATE mappings.
  2. It should also include read faults on /dev/zero or MAP_ANONYMOUS, assuming a kernel did not implement them as reading a shared zero page and did not implement them to allocate pages for the entire mapping immediately.
  3. But more generally, it could be any first access to a page. This is because it seems that the page tables for memory mappings are generally populated on-demand. (Which would be done by a page fault, and if the file page was already in cache, it would only be a minor page fault).

    MAP_NONBLOCK (since Linux 2.5.46)

    This flag is meaningful only in conjunction with MAP_POPULATE. Don't perform read-ahead: create page tables entries only for pages that are already present in RAM. Since Linux 2.6.23, this flag causes MAP_POPULATE to do nothing. One day, the combina‐ tion of MAP_POPULATE and MAP_NONBLOCK may be reimplemented.


EDIT: Further excerpts detailing the above

A commenter asked for more concrete details, to clarify the faulting address and instruction. There are many excerpts in the initial link https://bugzilla.redhat.com/show_bug.cgi?id=1557682

The fault varies as described in the bug link. Here are fresh excerpts from a recent instance.

$ gdb 2018-03-21.core
...
Core was generated by `/usr/bin/Xwayland :0 -rootless -terminate -core -listen 4 -listen 5 -displayfd'.
Program terminated with signal SIGBUS, Bus error.
#0  _dl_fixup (l=0x7fc0be2e0130, reloc_arg=203) at ../elf/dl-runtime.c:73
73    const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
[Current thread is 1 (Thread 0x7fc0be29fa80 (LWP 1918))]
(gdb) p $_siginfo.si_signum
$1 = 7
(gdb) p $_siginfo.si_code
$2 = 2
(gdb) p $_siginfo._sifields._sigfault.si_addr
$3 = (void *) 0x41bd80
(gdb) disassemble
Dump of assembler code for function _dl_fixup:
   0x00007fc0be0c8bd0 <+0>: push   %rbx
   0x00007fc0be0c8bd1 <+1>: mov    %rdi,%r10
   0x00007fc0be0c8bd4 <+4>: mov    %esi,%esi
   0x00007fc0be0c8bd6 <+6>: lea    (%rsi,%rsi,2),%rdx
   0x00007fc0be0c8bda <+10>:    sub    $0x10,%rsp
   0x00007fc0be0c8bde <+14>:    mov    0x68(%rdi),%rax
   0x00007fc0be0c8be2 <+18>:    mov    0x8(%rax),%rdi
   0x00007fc0be0c8be6 <+22>:    mov    0xf8(%r10),%rax
   0x00007fc0be0c8bed <+29>:    mov    0x8(%rax),%rax
   0x00007fc0be0c8bf1 <+33>:    lea    (%rax,%rdx,8),%r8
   0x00007fc0be0c8bf5 <+37>:    mov    0x70(%r10),%rax
=> 0x00007fc0be0c8bf9 <+41>:    mov    0x8(%r8),%rcx
(gdb) p/x $r8
$4 = 0x41bd78
(gdb) p/x $r8 + 8
$5 = 0x41bd80

Note this instruction is fetching the value reloc->r_info as per the highlighted source line.

(gdb) p reloc
$6 = (const Elf64_Rela * const) 0x41bd78
(gdb) p &reloc->r_info
$7 = (Elf64_Xword *) 0x41bd80
(gdb) p *reloc
$8 = {r_offset = 8443504, r_info = 936302870535, r_addend = 0}

The faulting address falls within the text mapping below (from maps file captured by abrtd):

00400000-0060b000 r-xp 00000000 fd:00 1708508                            /usr/bin/Xwayland
0080a000-0080d000 r--p 0020a000 fd:00 1708508                            /usr/bin/Xwayland
0080d000-00817000 rw-p 0020d000 fd:00 1708508                            /usr/bin/Xwayland

$ size -x /usr/bin/Xwayland
   text    data     bss     dec     hex filename
0x209ffb     0xbe9d 0x1f3e0 2314872  235278 /usr/bin/Xwayland

Solution

  • Thanks everyone for your support. It was indeed a transient IO error. It seems the SIGBUS read-fault path doesn't necessarily log anything in the kernel log, unlike the cases I'm used to seeing for IO errors.

    https://marc.info/?l=linux-ide&m=152232081917215&w=2

    v4.15 intermittent errors on suspend/resume

    To anyone waiting for the other show to drop on the SATA LPM work...

    I've found something that's at least in the same area. It triggered a fsck on my system 2 days ago. Evidence suggests it's occurred on many other machines. I felt that was reason enough to give you a heads up :).

    I checked and I don't seem to have LPM enabled during runtime, even when running on battery. My errors are all on suspend/resume, so maybe that behaviour was changed at the same time?

    It doesn't always show in kernel logs. What I first noticed was a mysterious SIGBUS that kills Xwayland (and hence the entire Gnome session) on resume from suspend. It surprised me to learn that this SIGBUS can happen, without leaving anything like the read errors I'm used to seeing in the kernel log!

    My coredumps show the SIGBUS fault address is an instruction read inside the program code of Xwayland. The backtraces vary along the same call chain - the common factor is that they're always at the first instruction of the function. I assume it varies according to which page is not currently in-core, and hence triggers the failing read request.

    There are hundreds of backtraces along this same call chain from other users, reported automatically to Fedora, that look the same. At least so far we don't have any more plausible for them. I admit it's funny that Xwayland is so prominent, and I haven't been swamped with SIGBUS in other processes, but I stand by this analysis.

    These crashes started within 24 hours of Fedora upgrading to kernel v4.15.

    Fedora bug for the Xwayland SIGBUS: https://bugzilla.redhat.com/show_bug.cgi?id=1553979

    My duplicate bug I've been spamming with puzzled comments: https://bugzilla.redhat.com/show_bug.cgi?id=1557682

    The earliest and biggest of the many crash report buckets:

    EXT4 filesystem error:

    Mar 27 11:28:30 alan-laptop kernel: PM: suspend exit
    ...
    Mar 27 11:28:30 alan-laptop kernel: EXT4-fs error (device dm-2):  ext4_find_entry:1436: inode #5514052: comm thunderbird: reading directory lblock 0
    Mar 27 11:28:30 alan-laptop kernel: Buffer I/O error on dev dm-2, logical block 0, lost sync page write
    (this marked the FS as needing fsck on next boot)
    

    More frequently, it logs these swap errors:

    Mar 02 18:47:03 alan-laptop kernel: Restarting tasks ...
    Mar 02 18:47:03 alan-laptop kernel: Read-error on swap-device (253:1:836184)
    Mar 02 18:47:06 alan-laptop kernel: Read-error on swap-device (253:1:580280)
    

    My laptop LPM status, even after removing AC power:

    $ head /sys/class/scsi_host/host*/link_power_management_policy
    ==> /sys/class/scsi_host/host0/link_power_management_policy <==
    max_performance
    
    ==> /sys/class/scsi_host/host1/link_power_management_policy <==
    max_performance
    

    My laptop is a Dell Lattitude E5450. CPU is i5-5300U (a Broadwell).