Random Reboots AMD EPYC server

From Thomas-Krenn-Wiki
Jump to navigation Jump to search

On servers with AMD EPYC CPUs, it can come to random restarts when operating Linux with KVM. In Syslog, entries of the Boot Error Record Table (BERT) with the hint fru_text can be found: ProcessorError In this article, you will find detailed information on this error and proposed solutions to resolve it.

fru_text problem: ProcessorError

Random reboots occur on servers with AMD EPYC CPUs when operating Linux with KVM. You will find the following hints (log excerpt from November 2019):

Nov 4 15:43:07 debian-10 kernel: [ 1.569903] BERT: Error records from previous boot:
Nov 4 15:43:07 debian-10 kernel: [ 1.570000] [Hardware Error]: event severity: info
Nov 4 15:43:07 debian-10 kernel: [ 1.570095] [Hardware Error]: Error 0, type: fatal
Nov 4 15:43:07 debian-10 kernel: [ 1.570191] [Hardware Error]: fru_text: ProcessorError
Nov 4 15:43:07 debian-10 kernel: [ 1.570288] [Hardware Error]: section_type: IA32/X64 processor error
Nov 4 15:43:07 debian-10 kernel: [ 1.570389] [Hardware Error]: Local APIC_ID: 0x0
Nov 4 15:43:07 debian-10 kernel: [ 1.570484] [Hardware Error]: CPUID Info:
Nov 4 15:43:07 debian-10 kernel: [ 1.570579] [Hardware Error]: 00000000: 00800f12 00000000 00300800 00000000
Nov 4 15:43:07 debian-10 kernel: [ 1.570682] [Hardware Error]: 00000010: 76d8320b 00000000 178bfbff 00000000
Nov 4 15:43:07 debian-10 kernel: [ 1.570786] [Hardware Error]: 00000020: 48ab7f57 4f6cdc34 b5b0d3a7 1443a7b0
Nov 4 15:43:07 debian-10 kernel: [ 1.570889] [Hardware Error]: Error Information Structure 0:
Nov 4 15:43:07 debian-10 kernel: [ 1.570988] [Hardware Error]: Error Structure Type: unknown
Nov 4 15:43:07 debian-10 kernel: [ 1.571087] [Hardware Error]: Error Structure Type: 00000001-0000-0000-2700-980000000000
Nov 4 15:43:07 debian-10 kernel: [ 1.571221] [Hardware Error]: Error 1, type: fatal
Nov 4 15:43:07 debian-10 kernel: [ 1.571316] [Hardware Error]: fru_text: ProcessorError
Nov 4 15:43:07 debian-10 kernel: [ 1.571412] [Hardware Error]: section_type: IA32/X64 processor error
Nov 4 15:43:07 debian-10 kernel: [ 1.571513] [Hardware Error]: Local APIC_ID: 0x1
Nov 4 15:43:07 debian-10 kernel: [ 1.571608] [Hardware Error]: CPUID Info:
Nov 4 15:43:07 debian-10 kernel: [ 1.571701] [Hardware Error]: 00000000: 00800f12 00000000 01300800 00000000
Nov 4 15:43:07 debian-10 kernel: [ 1.571805] [Hardware Error]: 00000010: 76d8320b 00000000 178bfbff 00000000
Nov 4 15:43:07 debian-10 kernel: [ 1.571908] [Hardware Error]: 00000020: a55701f5 43dee3ef 9b2472ac 2cad3f57
Nov 4 15:43:07 debian-10 kernel: [ 1.572011] [Hardware Error]: Error Information Structure 0:
Nov 4 15:43:07 debian-10 kernel: [ 1.572109] [Hardware Error]: Error Structure Type: unknown
Nov 4 15:43:07 debian-10 kernel: [ 1.572208] [Hardware Error]: Error Structure Type: 00000001-0000-0000-1f00-4d0600000000

Affected systems

The described problem appears on the system with the following configuration:

  • Supermicro Mainboard H11DSi-NT
  • 2x AMD EPYC 7401
  • Debian GNU/Linux 10 (with KVM)
  • Linux Kernel: 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u1 (2019-09-20) x86_64 GNU/Linux
  • BIOS version: BIOS 1.0c

In the Proxmox forum, you will also find reports about any reboots with the BERT-error message "fru_text: ProcessorError":[1]

  • Supermicro H11SSL-i
  • Supermicro H11DSU-iN

In the Fedora forum, you will also find a report on the error message "fru_text: ProcessorError":[2]

Solution proposal

Applicants in the Fedora forum wrote that the adjustment of the following BIOS-parameter has solved the following problem:[2]

Advanced -> NB Configuration -> IOMMU (change to Enabled)
Advanced -> PCIe/PCI/PnP Configuration -> SR-IOV Support (change to Enabled)

In general, we recommend an update on the current BIOS version. These contain newer AMD AGESA versions or microcodes.

More information

References

  1. Random Restarts (forum.proxmox.com)
  2. 2.0 2.1 first server error, reboot, what is this UUID ? (forums.fedoraforum.org)


Author: Werner Fischer

Werner Fischer, working in the Knowledge Transfer team at Thomas-Krenn, completed his studies of Computer and Media Security at FH Hagenberg in Austria. He is a regular speaker at many conferences like LinuxTag, OSMC, OSDC, LinuxCon, and author for various IT magazines. In his spare time he enjoys playing the piano and training for a good result at the annual Linz marathon relay.


Translator: Alina Ranzinger

Alina has been working at Thomas-Krenn.AG since 2024. After her training as multilingual business assistant, she got her job as assistant of the Product Management and is responsible for the translation of texts and for the organisation of the department.


Related articles

Chmod
Create an ISO Image from a source CD or DVD under Linux
OpenVPN with Pre-shared Key