Boot Error Record Table (BERT)

From Thomas-Krenn-Wiki
Jump to navigation Jump to search

The Boot Error Record Table (BERT) is an ACPI table which the firmware of a server system can provide the operating system with information about previous hardware errors. If a normal hardware error occurs in normal circumstances, the OS kernel is messaged. However, if the error is so extensive that the firmware resets the system or causes it to crash without notifying the OS kernel, the firmware can enter information about the error in the BERT.

Basics

The ACPI specification defines the format of the Boot Error Record Table in chapter 18.3[1]

The ACPI error interfaces (ACPI Platform Error Interfaces - APEI) are designed to check for hardware errors and report them to the the operating system as Corrected Error (CE) and Uncorrected Error (UC).

Under normal circumstances, the OS kernel is notified via a Non-maskable Interrupt (NMI), a machine check exception (MCE), or another method. The kernel processes the error information, reports it, and corrects the error if possible.

The error is sometimes so serious that the firmware resets the server without informing the OS-kernel. The Linux-kernel can use the Boot Error Record Table (BERT) for the next restart to determine the not reported hardware errors that occurred during a previous boot process:[2]

BERT: Error records from previous boot:
[Hardware Error]: It has been corrected by h/w and requires no further action
[Hardware Error]: event severity: corrected
[Hardware Error]:  Error 0, type: recoverable
[Hardware Error]:   section_type: memory error
[Hardware Error]:   error_status: 0x0000000000000400
[Hardware Error]:   physical_address: 0xffffffffffffffff
[Hardware Error]:   card: 1 module: 2 bank: 3 row: 1 column: 2 bit_position: 5
[Hardware Error]:   error_type: 2, single-bit ECC

Linux Kernel Support

Important changes of the BERT support in Linux:

Linux Kernel version Commit
log of all patches /drivers/acpi/apei/bert.c (git.kernel.org)
6.1 ACPI: APEI: Add BERT error log footer
6.0 ACPI: APEI: Better fix to avoid spamming the console with old error logs
5.18 ACPI/APEI: Limit printable size of BERT table data
4.8 ACPI / APEI: Add Boot Error Record Table (BERT) support

More information

References

  1. 18.3. Error Source Discovery (ACPI Specification Version 6.5)
  2. ACPI / APEI: Add Boot Error Record Table (BERT) support (git.kernel.org, 29.06.2016)


Author: Werner Fischer

Werner Fischer, working in the Knowledge Transfer team at Thomas-Krenn, completed his studies of Computer and Media Security at FH Hagenberg in Austria. He is a regular speaker at many conferences like LinuxTag, OSMC, OSDC, LinuxCon, and author for various IT magazines. In his spare time he enjoys playing the piano and training for a good result at the annual Linz marathon relay.


Translator: Alina Ranzinger

Alina has been working at Thomas-Krenn.AG since 2024. After her training as multilingual business assistant, she got her job as assistant of the Product Management and is responsible for the translation of texts and for the organisation of the department.


Related articles

Measuring I/O performance
Non-maskable Interrupt (NMI)
SATA, SAS & NVMe plug types