Random Reboots AMD EPYC server
On servers with AMD EPYC CPUs, it can come to random restarts when operating Linux with KVM. In Syslog, entries of the Boot Error Record Table (BERT) with the hint fru_text can be found: ProcessorError In this article, you will find detailed information on this error and proposed solutions to resolve it.
fru_text problem: ProcessorError
Random reboots occur on servers with AMD EPYC CPUs when operating Linux with KVM. You will find the following hints (log excerpt from November 2019):
Nov 4 15:43:07 debian-10 kernel: [ 1.569903] BERT: Error records from previous boot: Nov 4 15:43:07 debian-10 kernel: [ 1.570000] [Hardware Error]: event severity: info Nov 4 15:43:07 debian-10 kernel: [ 1.570095] [Hardware Error]: Error 0, type: fatal Nov 4 15:43:07 debian-10 kernel: [ 1.570191] [Hardware Error]: fru_text: ProcessorError Nov 4 15:43:07 debian-10 kernel: [ 1.570288] [Hardware Error]: section_type: IA32/X64 processor error Nov 4 15:43:07 debian-10 kernel: [ 1.570389] [Hardware Error]: Local APIC_ID: 0x0 Nov 4 15:43:07 debian-10 kernel: [ 1.570484] [Hardware Error]: CPUID Info: Nov 4 15:43:07 debian-10 kernel: [ 1.570579] [Hardware Error]: 00000000: 00800f12 00000000 00300800 00000000 Nov 4 15:43:07 debian-10 kernel: [ 1.570682] [Hardware Error]: 00000010: 76d8320b 00000000 178bfbff 00000000 Nov 4 15:43:07 debian-10 kernel: [ 1.570786] [Hardware Error]: 00000020: 48ab7f57 4f6cdc34 b5b0d3a7 1443a7b0 Nov 4 15:43:07 debian-10 kernel: [ 1.570889] [Hardware Error]: Error Information Structure 0: Nov 4 15:43:07 debian-10 kernel: [ 1.570988] [Hardware Error]: Error Structure Type: unknown Nov 4 15:43:07 debian-10 kernel: [ 1.571087] [Hardware Error]: Error Structure Type: 00000001-0000-0000-2700-980000000000 Nov 4 15:43:07 debian-10 kernel: [ 1.571221] [Hardware Error]: Error 1, type: fatal Nov 4 15:43:07 debian-10 kernel: [ 1.571316] [Hardware Error]: fru_text: ProcessorError Nov 4 15:43:07 debian-10 kernel: [ 1.571412] [Hardware Error]: section_type: IA32/X64 processor error Nov 4 15:43:07 debian-10 kernel: [ 1.571513] [Hardware Error]: Local APIC_ID: 0x1 Nov 4 15:43:07 debian-10 kernel: [ 1.571608] [Hardware Error]: CPUID Info: Nov 4 15:43:07 debian-10 kernel: [ 1.571701] [Hardware Error]: 00000000: 00800f12 00000000 01300800 00000000 Nov 4 15:43:07 debian-10 kernel: [ 1.571805] [Hardware Error]: 00000010: 76d8320b 00000000 178bfbff 00000000 Nov 4 15:43:07 debian-10 kernel: [ 1.571908] [Hardware Error]: 00000020: a55701f5 43dee3ef 9b2472ac 2cad3f57 Nov 4 15:43:07 debian-10 kernel: [ 1.572011] [Hardware Error]: Error Information Structure 0: Nov 4 15:43:07 debian-10 kernel: [ 1.572109] [Hardware Error]: Error Structure Type: unknown Nov 4 15:43:07 debian-10 kernel: [ 1.572208] [Hardware Error]: Error Structure Type: 00000001-0000-0000-1f00-4d0600000000
Affected systems
The described problem appears on the system with the following configuration:
- Supermicro Mainboard H11DSi-NT
- 2x AMD EPYC 7401
- Debian GNU/Linux 10 (with KVM)
- Linux Kernel: 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u1 (2019-09-20) x86_64 GNU/Linux
- BIOS version: BIOS 1.0c
In the Proxmox forum, you will also find reports about any reboots with the BERT-error message "fru_text: ProcessorError":[1]
- Supermicro H11SSL-i
- Supermicro H11DSU-iN
In the Fedora forum, you will also find a report on the error message "fru_text: ProcessorError":[2]
- Supermicro H11SSL-i
- AMD EPYC 7301
- Ubuntu 18.04 LTS
- VMware Workstation
Solution proposal
Applicants in the Fedora forum wrote that the adjustment of the following BIOS-parameter has solved the following problem:[2]
Advanced -> NB Configuration -> IOMMU (change to Enabled) Advanced -> PCIe/PCI/PnP Configuration -> SR-IOV Support (change to Enabled)
In general, we recommend an update on the current BIOS version. These contain newer AMD AGESA versions or microcodes.
More information
- random reboots (amd epyc) (forum.proxmox.com, 14.01.2022)
- AMD EPYC based systems rebooting (forum.proxmox.com, 16.05.2019 - 21.09.2022)
- Proxmox Mystery Random Reboots (forum.proxmox.com, 29.03.2023 - 17.07.2023)
- Host server keeps restarting randomly Host server keeps restarting randomly (forum.proxmox.com, 02.03.2023)
- Kernel: "BERT: Error records from previous boot" (access.redhat.com, 20.09.2023)
References
- ↑ Random Restarts (forum.proxmox.com)
- ↑ 2.0 2.1 first server error, reboot, what is this UUID ? (forums.fedoraforum.org)
|
Author: Werner Fischer Werner Fischer, working in the Knowledge Transfer team at Thomas-Krenn, completed his studies of Computer and Media Security at FH Hagenberg in Austria. He is a regular speaker at many conferences like LinuxTag, OSMC, OSDC, LinuxCon, and author for various IT magazines. In his spare time he enjoys playing the piano and training for a good result at the annual Linz marathon relay.
|
|
Translator: Alina Ranzinger Alina has been working at Thomas-Krenn.AG since 2024. After her training as multilingual business assistant, she got her job as assistant of the Product Management and is responsible for the translation of texts and for the organisation of the department.
|


