linux-kernelcentoslinux-device-driverkernel-moduleacpi

Trying to run EINJ but not able to find <debugfs mount point>/apei/einj


I'm running CentOS Stream 10 on a system, and I'm trying to run EINJ on it. According to documentation, I'm supposed to find a file named available_error_type over there, but it isn't.

This is the kernel version.

[root@localhost ~]# uname -r
6.12.0-32.el10.x86_64

I'm reading the relevant documentation and it seems that my bios does support it.

[root@localhost ~]# dmesg | grep -i "einj"
[    0.011256] ACPI: EINJ 0x000000005A282B40 000150 (v01 ALASKA A M I    00000001 INTL 0
[    0.011292] ACPI: Reserving EINJ table memory at [mem 0x5a282b40-0x5a282c8f]
[root@localhost ~]# ls /sys/firmware/acpi/tables/EINJ
/sys/firmware/acpi/tables/EINJ

Checked if the relevant options are enabled in my kernel config.

[root@localhost ~]# grep -i "CONFIG_ACPI_APEI" /boot/config-$(uname -r)
CONFIG_ACPI_APEI=y
CONFIG_ACPI_APEI_GHES=y
CONFIG_ACPI_APEI_PCIEAER=y
CONFIG_ACPI_APEI_MEMORY_FAILURE=y
CONFIG_ACPI_APEI_EINJ=m
CONFIG_ACPI_APEI_EINJ_CXL=y
# CONFIG_ACPI_APEI_ERST_DEBUG is not set

And I ran this to mount the debugfs

mount -t debugfs none /sys/kernel/debug

But when I open up the directory, hoping to find the EINJ directory with available_error_type file, it isn't there.

I then ran both modprobe einj and modprobe einj param_extension=on, hoping to find the available_error_type file. lsmod confirms that einj is loaded, but the file isn't there. However, after running the modprobe command, I find this on dmesg.

[  918.549253] EINJ: Error collecting EINJ resources.
[  918.549272] acpi-einj acpi-einj: probe with driver acpi-einj failed with error -22

I even ran modprobe cxl_core and modprobe cxl_acpi, but it didn't help.

What am I doing wrong? What resources does it require? I'm new to this stuff. In case it's relevant, the system I'm running this on has Intel Xeon Gold 6240R.

Edit: fixed a spelling mistake. I forgot to include that CONFIG_DEBUG_FS is also enabled. I also found this on dmesg, just before the message that says Error collecting EINJ resources.

[   12.505110] [Firmware Bug]: APEI: Invalid physical address in GAR [0x0/0/0/0/0]

Solution

  • I opened the BIOS (AMI), and in the Platform Configuration section, there was an option named Error Injection Settings. Turns out this was disabled. Enabled it and it started working.