AER Multiple Corrected error received 0000:00:1c.4

From Thomas-Krenn-Wiki
Jump to navigation Jump to search

Under Linux, a single PCIe error successfully corrected by ECC mechanisms can result in repeated log notifications “AER Multiple Corrected error received”. The cause for the repeating notification is that the AER driver does not delete the notification. In this article, we show how to avoid repeated notifications with a workaround.

Affected hardware

During the test with Ubuntu 18.04 LTS (Linux Kernel 5.4), we could reproduce the repeated log notifications with the following hardware:

  • LES plus v2
  • Azurewave AW-CB161H Mini PCIe half size Wifi module

log-file excerpt

Apr  8 08:45:41 ubuntu1804 systemd-networkd[961]: enp2s0: Link UP
Apr  8 08:45:41 ubuntu1804 NetworkManager[1187]: <info>  [1617864341.5616] wifi-nl80211: (wlp3s0): using nl80211 for WiFi device control
Apr  8 08:45:41 ubuntu1804 NetworkManager[1187]: <info>  [1617864341.5618] device (wlp3s0): driver supports Access Point (AP) mode
Apr  8 08:45:41 ubuntu1804 NetworkManager[1187]: <info>  [1617864341.5627] manager: (wlp3s0): new 802.11 WiFi device (/org/freedesktop/NetworkManager/Devices/4)
Apr  8 08:45:41 ubuntu1804 NetworkManager[1187]: <info>  [1617864341.5634] device (wlp3s0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Apr  8 08:45:41 ubuntu1804 kernel: [    8.546423] pcieport 0000:00:1c.4: AER: Multiple Corrected error received: 0000:00:1c.4
Apr  8 08:45:41 ubuntu1804 kernel: [    8.546437] pcieport 0000:00:1c.4: AER: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Apr  8 08:45:41 ubuntu1804 kernel: [    8.547307] pcieport 0000:00:1c.4: AER:   device [8086:9d14] error status/mask=00000001/00002000
Apr  8 08:45:41 ubuntu1804 kernel: [    8.548191] pcieport 0000:00:1c.4: AER:    [ 0] RxErr
Apr  8 08:45:41 ubuntu1804 kernel: [    8.549031] pcieport 0000:00:1c.4: AER: Multiple Corrected error received: 0000:00:1c.4
Apr  8 08:45:41 ubuntu1804 kernel: [    8.549037] pcieport 0000:00:1c.4: AER: can't find device of ID00e4
Apr  8 08:45:41 ubuntu1804 kernel: [    8.549038] pcieport 0000:00:1c.4: AER: Multiple Corrected error received: 0000:00:1c.4
Apr  8 08:45:41 ubuntu1804 kernel: [    8.549041] pcieport 0000:00:1c.4: AER: can't find device of ID00e4
Apr  8 08:45:41 ubuntu1804 kernel: [    8.549042] pcieport 0000:00:1c.4: AER: Multiple Corrected error received: 0000:00:1c.4
Apr  8 08:45:41 ubuntu1804 kernel: [    8.549046] pcieport 0000:00:1c.4: AER: can't find device of ID00e4
Apr  8 08:45:41 ubuntu1804 kernel: [    8.549211] pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Apr  8 08:45:41 ubuntu1804 kernel: [    8.549215] pcieport 0000:00:1c.4: AER: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Apr  8 08:45:41 ubuntu1804 kernel: [    8.550072] pcieport 0000:00:1c.4: AER:   device [8086:9d14] error status/mask=00000001/00002000
Apr  8 08:45:41 ubuntu1804 kernel: [    8.550954] pcieport 0000:00:1c.4: AER:    [ 0] RxErr

[...]
tk@ubuntu1804:~$ uname -a
Linux ubuntu1804 5.4.0-65-generic #73~18.04.1-Ubuntu SMP Tue Jan 19 09:02:24 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cause

According to Bjorn Helgaas, Linux kernel developer and PCI subsystem maintainer, the cause of the repeated log entries is that the AER driver receives the notification about the corrected error but does not delete the notification afterwards. So far, no one has fixed this bug. As a workaround, he mentions the possibility of disabling AER with the boot parameter “pci=noaer”.[1]

Workaround

To stop the notifications, deactivate the function Advanced Error Reporting via BIOS or Linux Kernel parameter. The following settings need to be made in the BIOS:

  • Chipset -> PCH-IO Configuration -> PCI Express Configuration -> mPCI Slot -> Advanced Error Reporting -> Disabled
  • Chipset -> PCH-IO Configuration -> PCI Express Configuration -> mPCI Slot -> ASPM -> Auto

Alternatively:

  • use kernel parameter "pci=noaer".[1][2]

References

  1. 1.0 1.1 Re: 4.4.x kernel (only) gives pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4 (lore.kernel.org, 15.02.2016) Short story: the AER driver receives the corrected error notification but fails to clear it. Nobody has stepped up to fix the bug yet. You can probably work around it by disabling AER completely by booting with "pci=noaer". (Bjorn Helgaas, Linux PCI subsystem maintainer)
  2. Bug 196183 - AER: Corrected error received: id=00e8 - Comment 5 according to https://bbs.archlinux.org/viewtopic.php?id=232917 (URL hier korrigiert), one could use pci=noaer to just disable AER: "it seems disabling AER should be safe, and you are still left with basic PCIe error reporting capabilities. AER is just for "advanced" error reporting."

Template:Ederr

Translator: Alina Ranzinger

Alina has been working at Thomas-Krenn.AG since 2024. After her training as multilingual business assistant, she got her job as assistant of the Product Management and is responsible for the translation of texts and for the organisation of the department.


Related articles

Comparison of gzip, bzip2, xz
Eno1 network interface
Recreate Linux root password