Correction of ASUS RS500A-E10-RS12U boot loap

From Thomas-Krenn-Wiki
Jump to navigation Jump to search

At ASUS RS500A-E10-RS12U server, there may be problems with elder BIOS versions when adding additional NVMe SSDs. The problems manifest themselves during a hot add through SATA error messages. With a cold add, the system gets stuck in a boot loop when attempting to start up. The BIOS 4301 update solves the problem.

Affected hardware

  • ASUS RS500A-E10-RS12U with EPYC 7402P in the following configuration:
    • BIOS version 0501 (Release Date: 11/07/2019) as well as 4003 (07/20/2020) tested (the problem occured on both versions)
    • Slot 11 + 12: SATA SSDs (Samsung MZ7KH240HAHQ-00005)
      • NVMe adapter-card for slot 11 + 12 expanded (In the slot, a second one came into the Mellanox 25GbE ConnectX-5 EN SFP28 Dual Port network card)
    • Asus 10 Gigabit RJ45 Dual Port Mezzanine network card
    • 8x 64 GB RAM LDRIMM Samsung
    • 1x Intel P4500 SSDPE2KX020T7
    • 4x Intel P4510 SSDPE2KX020T8

The following problems were observed when adding the following NVMe SSD:

  • 1x Intel/Solidigm D7-P5520 SSDPF2KX019T1M

Problems

Hot-Add

The following problem occurred on a system with BIOS version 0501 (release date: 11/07/2019) when a sixth NVMe SSD was connected to the backplane during operation (hot-add). Exactly 30 seconds after detecting the nvme5, SATA errors occurred in dmesg. All SATA connections were reset, which also reset the two SATA SSDs of the operating system (configured as software RAID 1):

[Fr Okt  7 08:10:09 2022] pcieport 0000:40:01.4: pciehp: Slot(5-1): Card present
[Fr Okt  7 08:10:09 2022] pcieport 0000:40:01.4: pciehp: Slot(5-1): Link Up
[...]
[Fr Okt  7 08:10:09 2022] nvme nvme5: pci function 0000:42:00.0
[Fr Okt  7 08:10:09 2022] nvme 0000:42:00.0: enabling device (0000 -> 0002)
[Fr Okt  7 08:10:12 2022] nvme nvme5: 133/0/2 default/read/poll queues
[Fr Okt  7 08:10:39 2022] ata16.00: exception Emask 0x52 SAct 0x0 SErr 0xffffffff action 0xe frozen
[Fr Okt  7 08:10:39 2022] ata16: SError: { RecovData RecovComm UnrecovData Persist Proto HostInt PHYRdyChg PHYInt CommWake 10B8B Dispar BadCRC Handshk LinkSeq TrStaTrns UnrecFIS DevExch }
[Fr Okt  7 08:10:39 2022] ata16.00: failed command: FLUSH CACHE EXT
[Fr Okt  7 08:10:39 2022] ata16.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 10
                                   res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[Fr Okt  7 08:10:39 2022] ata16.00: status: { DRDY }
[Fr Okt  7 08:10:39 2022] ata16: hard resetting link
[Fr Okt  7 08:10:39 2022] ahci 0000:48:00.0: AHCI controller unavailable!
[Fr Okt  7 08:10:40 2022] ata16: failed to resume link (SControl FFFFFFFF)
[Fr Okt  7 08:10:40 2022] ata16: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)
[Fr Okt  7 08:10:45 2022] ata16: hard resetting link
[Fr Okt  7 08:10:45 2022] ahci 0000:48:00.0: AHCI controller unavailable!
[Fr Okt  7 08:10:45 2022] ata15.00: exception Emask 0x52 SAct 0x2 SErr 0xffffffff action 0xe frozen
[Fr Okt  7 08:10:45 2022] ata15: SError: { RecovData RecovComm UnrecovData Persist Proto HostInt PHYRdyChg PHYInt CommWake 10B8B Dispar BadCRC Handshk LinkSeq TrStaTrns UnrecFIS DevExch }
[Fr Okt  7 08:10:45 2022] ata15.00: failed command: READ FPDMA QUEUED
[Fr Okt  7 08:10:45 2022] ata15.00: cmd 60/60:08:b0:52:45/00:00:01:00:00/40 tag 1 ncq dma 49152 in
                                   res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[Fr Okt  7 08:10:45 2022] ata15.00: status: { DRDY }
[Fr Okt  7 08:10:45 2022] ata15: hard resetting link
[Fr Okt  7 08:10:45 2022] ahci 0000:48:00.0: AHCI controller unavailable!
[Fr Okt  7 08:10:47 2022] ata15: failed to resume link (SControl FFFFFFFF)
[Fr Okt  7 08:10:47 2022] ata15: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)
[Fr Okt  7 08:10:52 2022] ata15: hard resetting link
[...]
[Fr Okt  7 08:11:03 2022] md: super_written gets error=10
[Fr Okt  7 08:11:03 2022] md/raid1:md1: Disk failure on sdb3, disabling device.
                          md/raid1:md1: Operation continuing on 1 devices.
[...]
[Fr Okt  7 08:11:04 2022] sd 14:0:0:0: [sda] Stopping disk
[Fr Okt  7 08:11:04 2022] sd 14:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Fr Okt  7 08:11:04 2022] md/raid1:md0: sda2: unrecoverable I/O read error for block 1638400
[Fr Okt  7 08:11:04 2022] md: super_written gets error=10
[Fr Okt  7 08:11:04 2022] md: super_written gets error=10
[...]

Cold-Add

Problems also occur if the sixth NVMe SSD is added while the server is powered off. The server gets stuck in a boot loop during startup.

We were also able to reproduce the problem with BIOS version 4003 (07/20/2020) and BMC version 2.03.1.

Solution BIOS 4301

The BIOS update on version 4301 (11/19/2021) and BMC version 2.03.1 solves the problems.

THe following dmesg excerpt shows how to add a seventh NVMe SSD:

[Mi Dez 14 18:54:02 2022] pcieport 0000:40:01.3: pciehp: Slot(6): Card present
[Mi Dez 14 18:54:02 2022] pcieport 0000:40:01.3: pciehp: Slot(6): Link Up
[Mi Dez 14 18:54:02 2022] pci 0000:41:00.0: [8086:0b60] type 00 class 0x010802
[Mi Dez 14 18:54:02 2022] pci 0000:41:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[Mi Dez 14 18:54:02 2022] pci 0000:41:00.0: reg 0x30: [mem 0x00000000-0x0000ffff pref]
[Mi Dez 14 18:54:02 2022] pci 0000:41:00.0: enabling Extended Tags
[Mi Dez 14 18:54:02 2022] pci 0000:41:00.0: Adding to iommu group 28
[Mi Dez 14 18:54:02 2022] pci 0000:41:00.0: BAR 6: assigned [mem 0xb1100000-0xb110ffff pref]
[Mi Dez 14 18:54:02 2022] pci 0000:41:00.0: BAR 0: assigned [mem 0xb1110000-0xb1113fff 64bit]
[Mi Dez 14 18:54:02 2022] pcieport 0000:40:01.3: PCI bridge to [bus 41]
[Mi Dez 14 18:54:02 2022] pcieport 0000:40:01.3:   bridge window [io  0x4000-0x4fff]
[Mi Dez 14 18:54:02 2022] pcieport 0000:40:01.3:   bridge window [mem 0xb1100000-0xb11fffff]
[Mi Dez 14 18:54:02 2022] pcieport 0000:40:01.3:   bridge window [mem 0x20080200000-0x200803fffff 64bit pref]
[Mi Dez 14 18:54:02 2022] nvme nvme6: pci function 0000:41:00.0
[Mi Dez 14 18:54:02 2022] nvme 0000:41:00.0: enabling device (0000 -> 0002)
[Mi Dez 14 18:54:05 2022] nvme nvme6: 48/0/2 default/read/poll queues

More information


Author: Werner Fischer

Werner Fischer, working in the Knowledge Transfer team at Thomas-Krenn, completed his studies of Computer and Media Security at FH Hagenberg in Austria. He is a regular speaker at many conferences like LinuxTag, OSMC, OSDC, LinuxCon, and author for various IT magazines. In his spare time he enjoys playing the piano and training for a good result at the annual Linz marathon relay.


Translator: Alina Ranzinger

Alina has been working at Thomas-Krenn.AG since 2024. After her training as multilingual business assistant, she got her job as assistant of the Product Management and is responsible for the translation of texts and for the organisation of the department.


Related articles

ASMB update via DOS-stick
ASUS P11C-M/4L BIOS update
ASUS server BIOS settings via Web