Bei Linux Systemen mit NVIDIA Mellanox Netzwerkkarten der ConnectX-4/5/6 Serien kommt es mit den Linux Kernel Versionen 5.13.*, 5.14.* und 5.15.0 - 5.15.19 zu Bitfehlern sofern optische Transceiver verwendet werden. Die Ausgabe von ethtool -m [INTERFACE] zeigt eine Laser wavelength von 22560nm anstelle der üblichen 850nm. Konfigurationen mit DAC Kabeln zeigen keine derartigen Probleme. Ursache ist die Treiberänderung net/mlx5: Refactor module EEPROM query[1], welche mit Linux Kernel 5.13.0[2] implementiert wurde. Mit Linux Kernel 5.15.20[3] wurde das Problem mit der Treiberänderung net/mlx5e: Fix module EEPROM query[4] behoben. Potentiell betroffen sind Proxmox VE 7.1 Systeme (Linux Kernel 5.13) sowie Ubuntu 20.04 LTS Systeme mit aktiviertem Ubuntu LTS Hardware Enablement Stack (ebenso Linux Kernel 5.13). Ab Linux Kernel 5.16 ist das Problem jedenfalls behoben, ebenso mit Kernel 5.15.25 und .26. Als Workaround empfehlen wir für Promox VE und Ubuntu Systeme vorerst bei Linux Kernel 5.11 zu bleiben und erst nach Verfügbarkeit eines jeweils zurück portierten Bugfixes auf die neueren Versionen zu wechseln.
Wir haben die Entwickler von Ubuntu und Proxmox VE über diesen Bug informiert:
Testsystem:
Anmerkung: Bei der Longterm Linux Kernel Version 5.15 gab es bei folgenden Releases Änderungen am mlx5 Treiber:
Mit Linux Kernel 5.11 läuft es stabil. Die Probleme könnten durch die umfangreichen Änderungen am mlx5 Treiber mit Kernel 5.12 und 5.13 verursacht worden sein:
Mit Kernel 5.11 zeigt die Ausgabe von ethtool -m auch die korrekte Wellenlänge von 850nm (getestet mit Firmware 14.30.1004):
Identifier : 0x03 (SFP) Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID) Connector : 0x07 (LC) Transceiver codes : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Transceiver type : 10G Ethernet: 10G Base-SR Encoding : 0x06 (64B/66B) BR, Nominal : 10300MBd Rate identifier : 0x00 (unspecified) Length (SMF,km) : 0km Length (SMF) : 0m Length (50um) : 80m Length (62.5um) : 30m Length (Copper) : 0m Length (OM3) : 300m Laser wavelength : 850nm Vendor name : FLEXOPTIX Vendor OUI : 38:86:02 Vendor PN : P.8596.02 Vendor rev : A Option values : 0x00 0x1a Option : RX_LOS implemented Option : TX_FAULT implemented Option : TX_DISABLE implemented BR margin, max : 0% BR margin, min : 0% Vendor SN : F79BECY Date code : 200108 Optical diagnostics support : Yes Laser bias current : 5.874 mA Laser output power : 0.5370 mW / -2.70 dBm Receiver signal average optical power : 0.5630 mW / -2.49 dBm Module temperature : 28.96 degrees C / 84.12 degrees F Module voltage : 3.2872 V Alarm/warning flags implemented : Yes Laser bias current high alarm : Off Laser bias current low alarm : Off Laser bias current high warning : Off Laser bias current low warning : Off Laser output power high alarm : Off Laser output power low alarm : Off Laser output power high warning : Off Laser output power low warning : Off Module temperature high alarm : Off Module temperature low alarm : Off Module temperature high warning : Off Module temperature low warning : Off Module voltage high alarm : Off Module voltage low alarm : Off Module voltage high warning : Off Module voltage low warning : Off Laser rx power high alarm : Off Laser rx power low alarm : Off Laser rx power high warning : Off Laser rx power low warning : Off Laser bias current high alarm threshold : 50.000 mA Laser bias current low alarm threshold : 1.000 mA Laser bias current high warning threshold : 40.000 mA Laser bias current low warning threshold : 2.000 mA Laser output power high alarm threshold : 1.2589 mW / 1.00 dBm Laser output power low alarm threshold : 0.1175 mW / -9.30 dBm Laser output power high warning threshold : 1.0000 mW / 0.00 dBm Laser output power low warning threshold : 0.1479 mW / -8.30 dBm Module temperature high alarm threshold : 90.00 degrees C / 194.00 degrees F Module temperature low alarm threshold : -25.00 degrees C / -13.00 degrees F Module temperature high warning threshold : 85.00 degrees C / 185.00 degrees F Module temperature low warning threshold : -20.00 degrees C / -4.00 degrees F Module voltage high alarm threshold : 3.6000 V Module voltage low alarm threshold : 3.0000 V Module voltage high warning threshold : 3.5000 V Module voltage low warning threshold : 3.0500 V Laser rx power high alarm threshold : 1.2589 mW / 1.00 dBm Laser rx power low alarm threshold : 0.0490 mW / -13.10 dBm Laser rx power high warning threshold : 1.0000 mW / 0.00 dBm Laser rx power low warning threshold : 0.0617 mW / -12.10 dBm
Es ist grundsätzlich ein Link vorhanden, jedoch zeigen Monitoring Tools Bitfehler und die Ausgabe von ethtool -m zeigt eine falsche Wellenläge:
root@pve:~# ethtool -m enp1s0f0np0
Identifier : 0x03 (SFP)
Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID)
Connector : 0x07 (LC)
Transceiver codes : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 10G Ethernet: 10G Base-SR
Encoding : 0x06 (64B/66B)
BR, Nominal : 10300MBd
Rate identifier : 0x00 (unspecified)
Length (SMF,km) : 0km
Length (SMF) : 0m
Length (50um) : 80m
Length (62.5um) : 30m
Length (Copper) : 0m
Length (OM3) : 300m
Laser wavelength : 22560nm
Vendor name : FLEXOPTIX ____
Vendor OUI : 00:00:00
Vendor PN : ____g_______FLEX
Vendor rev : OPTI
Option values : 0x03 0x04
Option : RX_LOS implemented, inverted
Option : Linear receiver output implemented
Option : Power level 2 requirement
BR margin, max : 7%
BR margin, min : 16%
Vendor SN : ________g_______
Date code : FLEXOPTI
Optical diagnostics support : Yes
Laser bias current : 0.000 mA
Laser output power : 0.0030 mW / -25.23 dBm
Receiver signal average optical power : 0.0000 mW / -inf dBm
Module temperature : 25.33 degrees C / 77.60 degrees F
Module voltage : 3.3902 V
Alarm/warning flags implemented : No
root@pve:~# dmesg | grep "firmware version"
[ 1.718833] mlx5_core 0000:01:00.0: firmware version: 14.32.1010
[ 2.053317] mlx5_core 0000:01:00.1: firmware version: 14.32.1010
root@pve:~# uname -a
Linux pve 5.13.19-4-pve #1 SMP PVE 5.13.19-9 (Mon, 07 Feb 2022 11:01:14 +0100) x86_64 GNU/Linux
root@pve:~# modinfo mlx5_core
filename: /lib/modules/5.13.19-4-pve/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
license: Dual BSD/GPL
description: Mellanox 5th generation network adapters (ConnectX series) core driver
author: Eli Cohen <eli@mellanox.com>
srcversion: E8D73322E55281A023CCA31
alias: auxiliary:mlx5_core.eth
alias: pci:v000015B3d0000A2DCsv*sd*bc*sc*i*
[...]
alias: pci:v000015B3d00001011sv*sd*bc*sc*i*
alias: auxiliary:mlx5_core.eth-rep
alias: auxiliary:mlx5_core.sf
depends: tls,pci-hyperv-intf,mlxfw,psample
retpoline: Y
intree: Y
name: mlx5_core
vermagic: 5.13.19-4-pve SMP mod_unload modversions
parm: debug_mask:debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (uint)
parm: prof_sel:profile selector. Valid range 0 - 2 (uint)
root@pve:~# ethtool -i enp1s0f0np0
driver: mlx5_core
version: 5.13.19-4-pve
firmware-version: 14.32.1010 (MT_2420110004)
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
root@pve:~# ethtool enp1s0f0np0
Settings for enp1s0f0np0:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseKX/Full
10000baseKR/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: None RS BASER
Advertised link modes: 1000baseKX/Full
10000baseKR/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: None RS BASER
Speed: 10000Mb/s
Duplex: Full
Auto-negotiation: on
Port: FIBRE
PHYAD: 0
Transceiver: internal
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000004 (4)
link
Link detected: yes
root@pve:~# dmidecode -t bios
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 3.3.0 present.
Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
Vendor: American Megatrends Inc.
Version: 0401
Release Date: 11/19/2021
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 16 MB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
Japanese floppy for NEC 9800 1.2 MB is supported (int 13h)
Japanese floppy for Toshiba 1.2 MB is supported (int 13h)
5.25"/360 kB floppy services are supported (int 13h)
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
CGA/mono video services are supported (int 10h)
USB legacy is supported
BIOS boot specification is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 4.1
Handle 0x004E, DMI type 13, 22 bytes
BIOS Language Information
Language Description Format: Long
Installable Languages: 1
en|US|iso8859-1
Currently Installed Language: en|US|iso8859-1
root@pve:~# lspci -s 01:00.0 -vvv -nn
01:00.0 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
Subsystem: Mellanox Technologies ConnectX-4 Lx Stand-up dual-port 10GbE MCX4121A-XCAT [15b3:0004]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 98
IOMMU group: 68
Region 0: Memory at 380ac000000 (64-bit, prefetchable) [size=32M]
Expansion ROM at f8c00000 [disabled] [size=1M]
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <4us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x8 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [48] Vital Product Data
Product Name: CX4121A - ConnectX-4 LX SFP28
Read-only fields:
[PN] Part number: MCX4121A-XCAT
[EC] Engineering changes: AM
[SN] Serial number: MT2130K05338
[V0] Vendor specific: PCIeGen3 x8
[RV] Reserved: checksum good, 0 byte(s) reserved
End
Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
AERCap: First Error Pointer: 04, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 1
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
IOVSta: Migration-
Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00
VF offset: 2, stride: 1, Device ID: 1016
Supported Page Size: 000007ff, System Page Size: 00000001
Region 0: Memory at 00000380ae800000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [1c0 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [230 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
root@pve:~# lspci -s 01:00.1 -vvv -nn
01:00.1 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
Subsystem: Mellanox Technologies ConnectX-4 Lx Stand-up dual-port 10GbE MCX4121A-XCAT [15b3:0004]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 201
IOMMU group: 69
Region 0: Memory at 380aa000000 (64-bit, prefetchable) [size=32M]
Expansion ROM at f8b00000 [disabled] [size=1M]
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <4us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x8 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [48] Vital Product Data
Product Name: CX4121A - ConnectX-4 LX SFP28
Read-only fields:
[PN] Part number: MCX4121A-XCAT
[EC] Engineering changes: AM
[SN] Serial number: MT2130K05338
[V0] Vendor specific: PCIeGen3 x8
[RV] Reserved: checksum good, 0 byte(s) reserved
End
Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
AERCap: First Error Pointer: 04, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy-
IOVSta: Migration-
Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 01
VF offset: 9, stride: 1, Device ID: 1016
Supported Page Size: 000007ff, System Page Size: 00000001
Region 0: Memory at 00000380ae000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [230 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
root@pve:~# cat /proc/partitions
major minor #blocks name
8 0 234431064 sda
8 1 1007 sda1
8 2 524288 sda2
8 3 233905735 sda3
253 0 7340032 dm-0
253 1 58458112 dm-1
253 2 1515520 dm-2
253 3 148299776 dm-3
253 4 148299776 dm-4
11 0 1048575 sr0
root@pve:~#
Als nächste Schritte werden wir veranlassen:
|
Autor: Werner Fischer Werner Fischer arbeitet im Product Management Team von Thomas-Krenn. Er evaluiert dabei neueste Technologien und teilt sein Wissen in Fachartikeln, bei Konferenzen und im Thomas-Krenn Wiki. Bereits 2005 - ein Jahr nach seinem Abschluss des Studiums zu Computer- und Mediensicherheit an der FH Hagenberg - heuerte er beim bayerischen Server-Hersteller an. Als Öffi-Fan nutzt er gerne Bus & Bahn und genießt seinen morgendlichen Spaziergang ins Büro. |