AMD EPYC Server with Ubuntu - Enable SATA Hot-Swap

From Thomas-Krenn-Wiki
Jump to navigation Jump to search

On AMD EPYC Server platforms (both Rome and Milan), hot-swap from SATA does not work with the default configuration of Ubuntu 22.04 LTS or Proxmox VE 7.x. The cause is the configured CONFIG_SATA_MOBILE_LPM_POLICY=3 Kernel option on the Ubuntu Kernel, which reduces the power consumption of mobile devices. The kernel boot option ahci.mobile_lpm_policy=1 fixes the problem. When using Microsoft Windows (Server 2022, Server 2019, Windows 10), hot-swap problems do not occur either.

Activate SATA Hot-Swap

The following kernel boot parameters enable hot swap:[1]

  • ahci.mobile_lpm_policy=0
  • ahci.mobile_lpm_policy=1
  • ahci.mobile_lpm_policy=2

Background information

CONFIG_SATA_MOBILE_LPM_POLICY configuration options

When compiling the Linux kernel, a desired standard SATA Link Power Management (LPM) policy for chipsets ("South Bridges") can be selected (CONFIG_SATA_MOBILE_LPM_POLICY).

The following policies are available for selection:[2][3]

CONFIG_SATA_MOBILE_LPM_POLICY Description ata_lpm_policy_names[][4] / ata_lpm_policy[5]
0 Keep firmware settings (Vanilla Kernel Default) ATA_LPM_UNKNOWN
1 Maximum performance ATA_LPM_MAX_POWER
2 Medium power ATA_LPM_MED_POWER
3 Medium power with Device Initiated PM enabled ATA_LPM_MED_POWER_WITH_DIPM
ATA_LPM_MIN_POWER_WITH_PARTIAL
4 Minimum power ATA_LPM_MIN_POWER

Note: The "Minimum power" setting is known to cause problems with some SSDs/HDDs and should therefore not be used.[6]

The Ubuntu 22.04 Kernel 5.15 is compiled with the option CONFIG_SATA_MOBILE_LPM_POLICY=3.[1] This setting disables an unused port during the boot process[7]. This saves energy and is especially important for mobile devices because it increases the battery runtime. However, with this setting hot-swapping does not work with server systems. In the course of the development of the Linux Kernel 5.19 it was considered to set CONFIG_SATA_MOBILE_LPM_POLICY=3 generally as default value. Due to the problems with hot-plug this was abandoned.[8][9]

SATA hot-swap problem examples

The following examples were performed with the following sample setup:

  • Supermicro H12SSL-NT mainboard (both hardware revision 1.01 and 1.02)
  • BIOS version 2.3 and 2.4
  • SC825TQC-R802LPB chassis
  • BPN-SAS3-825TQ backplane (problem can also be reproduced without backplane)
  • SATA disk directly connected to the mainboard via Slim-SAS cable (when using a Microsemi Adaptec HBA 1000-8i there is no hot-swap problem)

The how-swap problems only occur on AMD EPYC systems with the default configuration of Ubuntu 22.04 (without kernel boot option ahci.mobile_lpm_policy=1).

In a test with a Supermicro X12DPi-N6 motherboard with 3rd Generation Intel Xeon Scalable processors, there were no problems in the default configuration of Ubuntu 22.04.

Hot-add test

When plugging in a SATA disk, the new disk is not automatically recognized.

Hot-removal test

Removing a SATA disk while it is in running operation (hot-removal) is initially not recognized by the Ubuntu kernel.

Only when a read or write access to the affected disk is attempted, error messages are displayed:

[ 248.127538] ata5.00: exception Emask 0x10 SAct 0x1000000 SErr 0x90202 action 0x6 frozen
[ 248.129086] ata5: SError: { RecovComm Persist PHYRdyChg 10B8B }
[ 248.129859] ata5.00: failed command: READ FPDMA QUEUED
[ 248.130630] ata5.00: cmd 60/00:c0:00:00:00/01:00:00:00:00/40 tag 24 ncq dma 131072 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
[ 248.132201] ata5.00: status: { DRDY }
[ 248.132985] ata5: hard resetting link
[ 248.447963] ata5: SATA link down (SStatus 0 SControl 300)

References

  1. 1.0 1.1 Comment 51 for bug 1971576 (bugs.launchpad.net) On AMD EPYC, both ROME and Milan server platforms, SATA hot plug not working on Ubuntu 22.04 LTS. [...] Ubuntu kernel compile with configure CONFIG_SATA_MOBILE_LPM_POLICY=3. [...] To enable hotplug by adding below kernel parameter can make sata hot plug working ahci.mobile_lpm_policy=1. In our test, set to 0 or 2 also works.
  2. drivers/ata/Kconfig - Line 118 (git.kernel.org) [...] Select the Default SATA Link Power Management (LPM) policy to use for mobile / laptop variants of chipsets / "South Bridges". [...]
  3. Default SATA Link Power Management policy for low power chipsets - configname: CONFIG_SATA_LPM_POLICY (www.kernelconfig.io)
  4. ata_lpm_policy_names (elixir.bootlin.com)
  5. ata_lpm_policy (elixir.bootlin.com)
  6. ATA_HORKAGE_NOLPM (elixir.bootlin.com),
  7. Comment 46 for bug 1971576 (bugs.launchpad.net) When policy is set this way the code will look and see whether any links are enabled and power the port off if not. When the port is powered off, it doesn't get powered back on.
  8. RE: (PATCH 1/2) ahci: Add PhyRdy Change control on actual LPM capability (linux-ide Mailing List, 18.05.2022) This regression happened because it got brought back to 5.4-stable, and as it turns out the exact same FCH controller ID from the client silicon is used in another product. It's an ASUS server system with AMD Epyc processor. The regression is specifically along hotplug, that hotplug no longer works with the more aggressive policies.
  9. (PATCH 3/3) ahci: Document the loss of hotplug by new LPM policy (linux-ide Mailing List, 24.05.2022) Per AHCI spec v1.3.1, "7.3 Native Hot Plug Support", once LPM is enabled hotplug support needs to be disabled.

Further information


Foto Werner Fischer.jpg

Author: Werner Fischer

Werner Fischer, working in the Knowledge Transfer team at Thomas-Krenn, completed his studies of Computer and Media Security at FH Hagenberg in Austria. He is a regular speaker at many conferences like LinuxTag, OSMC, OSDC, LinuxCon, and author for various IT magazines. In his spare time he enjoys playing the piano and training for a good result at the annual Linz marathon relay.


Related articles

Linux Performance Analysis using kSar
PCIe Bus Error Status 00001100
Vi editor tips and tricks