Bei Linux Systemen mit NVIDIA Mellanox Netzwerkkarten der ConnectX-4/5/6 Serien kommt es mit den Linux Kernel Versionen 5.13.*, 5.14.* und 5.15.0 - 5.15.19 zu Bitfehlern sofern optische Transceiver verwendet werden. Die Ausgabe von ethtool -m [INTERFACE] zeigt eine Laser wavelength von 22560nm anstelle der üblichen 850nm. Konfigurationen mit DAC Kabeln zeigen keine derartigen Probleme. Ursache ist die Treiberänderung net/mlx5: Refactor module EEPROM query[1], welche mit Linux Kernel 5.13.0[2] implementiert wurde. Mit Linux Kernel 5.15.20[3] wurde das Problem mit der Treiberänderung net/mlx5e: Fix module EEPROM query[4] behoben. Potentiell betroffen sind Proxmox VE 7.1 Systeme (Linux Kernel 5.13) sowie Ubuntu 20.04 LTS Systeme mit aktiviertem Ubuntu LTS Hardware Enablement Stack (ebenso Linux Kernel 5.13). Ab Linux Kernel 5.16 ist das Problem jedenfalls behoben, ebenso mit Kernel 5.15.25 und .26. Als Workaround empfehlen wir für Promox VE und Ubuntu Systeme vorerst bei Linux Kernel 5.11 zu bleiben und erst nach Verfügbarkeit eines jeweils zurück portierten Bugfixes auf die neueren Versionen zu wechseln.
Wir haben die Entwickler von Ubuntu und Proxmox VE über diesen Bug informiert:
Testsystem:
Anmerkung: Bei der Longterm Linux Kernel Version 5.15 gab es bei folgenden Releases Änderungen am mlx5 Treiber:
Mit Linux Kernel 5.11 läuft es stabil. Die Probleme könnten durch die umfangreichen Änderungen am mlx5 Treiber mit Kernel 5.12 und 5.13 verursacht worden sein:
Mit Kernel 5.11 zeigt die Ausgabe von ethtool -m auch die korrekte Wellenlänge von 850nm (getestet mit Firmware 14.30.1004):
Identifier : 0x03 (SFP) Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID) Connector : 0x07 (LC) Transceiver codes : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Transceiver type : 10G Ethernet: 10G Base-SR Encoding : 0x06 (64B/66B) BR, Nominal : 10300MBd Rate identifier : 0x00 (unspecified) Length (SMF,km) : 0km Length (SMF) : 0m Length (50um) : 80m Length (62.5um) : 30m Length (Copper) : 0m Length (OM3) : 300m Laser wavelength : 850nm Vendor name : FLEXOPTIX Vendor OUI : 38:86:02 Vendor PN : P.8596.02 Vendor rev : A Option values : 0x00 0x1a Option : RX_LOS implemented Option : TX_FAULT implemented Option : TX_DISABLE implemented BR margin, max : 0% BR margin, min : 0% Vendor SN : F79BECY Date code : 200108 Optical diagnostics support : Yes Laser bias current : 5.874 mA Laser output power : 0.5370 mW / -2.70 dBm Receiver signal average optical power : 0.5630 mW / -2.49 dBm Module temperature : 28.96 degrees C / 84.12 degrees F Module voltage : 3.2872 V Alarm/warning flags implemented : Yes Laser bias current high alarm : Off Laser bias current low alarm : Off Laser bias current high warning : Off Laser bias current low warning : Off Laser output power high alarm : Off Laser output power low alarm : Off Laser output power high warning : Off Laser output power low warning : Off Module temperature high alarm : Off Module temperature low alarm : Off Module temperature high warning : Off Module temperature low warning : Off Module voltage high alarm : Off Module voltage low alarm : Off Module voltage high warning : Off Module voltage low warning : Off Laser rx power high alarm : Off Laser rx power low alarm : Off Laser rx power high warning : Off Laser rx power low warning : Off Laser bias current high alarm threshold : 50.000 mA Laser bias current low alarm threshold : 1.000 mA Laser bias current high warning threshold : 40.000 mA Laser bias current low warning threshold : 2.000 mA Laser output power high alarm threshold : 1.2589 mW / 1.00 dBm Laser output power low alarm threshold : 0.1175 mW / -9.30 dBm Laser output power high warning threshold : 1.0000 mW / 0.00 dBm Laser output power low warning threshold : 0.1479 mW / -8.30 dBm Module temperature high alarm threshold : 90.00 degrees C / 194.00 degrees F Module temperature low alarm threshold : -25.00 degrees C / -13.00 degrees F Module temperature high warning threshold : 85.00 degrees C / 185.00 degrees F Module temperature low warning threshold : -20.00 degrees C / -4.00 degrees F Module voltage high alarm threshold : 3.6000 V Module voltage low alarm threshold : 3.0000 V Module voltage high warning threshold : 3.5000 V Module voltage low warning threshold : 3.0500 V Laser rx power high alarm threshold : 1.2589 mW / 1.00 dBm Laser rx power low alarm threshold : 0.0490 mW / -13.10 dBm Laser rx power high warning threshold : 1.0000 mW / 0.00 dBm Laser rx power low warning threshold : 0.0617 mW / -12.10 dBm
Es ist grundsätzlich ein Link vorhanden, jedoch zeigen Monitoring Tools Bitfehler und die Ausgabe von ethtool -m zeigt eine falsche Wellenläge:
root@pve:~# ethtool -m enp1s0f0np0 Identifier : 0x03 (SFP) Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID) Connector : 0x07 (LC) Transceiver codes : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Transceiver type : 10G Ethernet: 10G Base-SR Encoding : 0x06 (64B/66B) BR, Nominal : 10300MBd Rate identifier : 0x00 (unspecified) Length (SMF,km) : 0km Length (SMF) : 0m Length (50um) : 80m Length (62.5um) : 30m Length (Copper) : 0m Length (OM3) : 300m Laser wavelength : 22560nm Vendor name : FLEXOPTIX ____ Vendor OUI : 00:00:00 Vendor PN : ____g_______FLEX Vendor rev : OPTI Option values : 0x03 0x04 Option : RX_LOS implemented, inverted Option : Linear receiver output implemented Option : Power level 2 requirement BR margin, max : 7% BR margin, min : 16% Vendor SN : ________g_______ Date code : FLEXOPTI Optical diagnostics support : Yes Laser bias current : 0.000 mA Laser output power : 0.0030 mW / -25.23 dBm Receiver signal average optical power : 0.0000 mW / -inf dBm Module temperature : 25.33 degrees C / 77.60 degrees F Module voltage : 3.3902 V Alarm/warning flags implemented : No
root@pve:~# dmesg | grep "firmware version" [ 1.718833] mlx5_core 0000:01:00.0: firmware version: 14.32.1010 [ 2.053317] mlx5_core 0000:01:00.1: firmware version: 14.32.1010 root@pve:~# uname -a Linux pve 5.13.19-4-pve #1 SMP PVE 5.13.19-9 (Mon, 07 Feb 2022 11:01:14 +0100) x86_64 GNU/Linux root@pve:~# modinfo mlx5_core filename: /lib/modules/5.13.19-4-pve/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko license: Dual BSD/GPL description: Mellanox 5th generation network adapters (ConnectX series) core driver author: Eli Cohen <eli@mellanox.com> srcversion: E8D73322E55281A023CCA31 alias: auxiliary:mlx5_core.eth alias: pci:v000015B3d0000A2DCsv*sd*bc*sc*i* [...] alias: pci:v000015B3d00001011sv*sd*bc*sc*i* alias: auxiliary:mlx5_core.eth-rep alias: auxiliary:mlx5_core.sf depends: tls,pci-hyperv-intf,mlxfw,psample retpoline: Y intree: Y name: mlx5_core vermagic: 5.13.19-4-pve SMP mod_unload modversions parm: debug_mask:debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (uint) parm: prof_sel:profile selector. Valid range 0 - 2 (uint) root@pve:~# ethtool -i enp1s0f0np0 driver: mlx5_core version: 5.13.19-4-pve firmware-version: 14.32.1010 (MT_2420110004) expansion-rom-version: bus-info: 0000:01:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes root@pve:~# ethtool enp1s0f0np0 Settings for enp1s0f0np0: Supported ports: [ FIBRE ] Supported link modes: 1000baseKX/Full 10000baseKR/Full Supported pause frame use: Symmetric Supports auto-negotiation: Yes Supported FEC modes: None RS BASER Advertised link modes: 1000baseKX/Full 10000baseKR/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Advertised FEC modes: None RS BASER Speed: 10000Mb/s Duplex: Full Auto-negotiation: on Port: FIBRE PHYAD: 0 Transceiver: internal Supports Wake-on: d Wake-on: d Current message level: 0x00000004 (4) link Link detected: yes root@pve:~# dmidecode -t bios # dmidecode 3.3 Getting SMBIOS data from sysfs. SMBIOS 3.3.0 present. Handle 0x0000, DMI type 0, 26 bytes BIOS Information Vendor: American Megatrends Inc. Version: 0401 Release Date: 11/19/2021 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 16 MB Characteristics: PCI is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported BIOS ROM is socketed EDD is supported Japanese floppy for NEC 9800 1.2 MB is supported (int 13h) Japanese floppy for Toshiba 1.2 MB is supported (int 13h) 5.25"/360 kB floppy services are supported (int 13h) 5.25"/1.2 MB floppy services are supported (int 13h) 3.5"/720 kB floppy services are supported (int 13h) 3.5"/2.88 MB floppy services are supported (int 13h) Print screen service is supported (int 5h) Serial services are supported (int 14h) Printer services are supported (int 17h) CGA/mono video services are supported (int 10h) USB legacy is supported BIOS boot specification is supported Targeted content distribution is supported UEFI is supported BIOS Revision: 4.1 Handle 0x004E, DMI type 13, 22 bytes BIOS Language Information Language Description Format: Long Installable Languages: 1 en|US|iso8859-1 Currently Installed Language: en|US|iso8859-1 root@pve:~# lspci -s 01:00.0 -vvv -nn 01:00.0 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015] Subsystem: Mellanox Technologies ConnectX-4 Lx Stand-up dual-port 10GbE MCX4121A-XCAT [15b3:0004] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 98 IOMMU group: 68 Region 0: Memory at 380ac000000 (64-bit, prefetchable) [size=32M] Expansion ROM at f8c00000 [disabled] [size=1M] Capabilities: [60] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 512 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <4us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (ok), Width x8 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+ EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [48] Vital Product Data Product Name: CX4121A - ConnectX-4 LX SFP28 Read-only fields: [PN] Part number: MCX4121A-XCAT [EC] Engineering changes: AM [SN] Serial number: MT2130K05338 [V0] Vendor specific: PCIeGen3 x8 [RV] Reserved: checksum good, 0 byte(s) reserved End Capabilities: [9c] MSI-X: Enable+ Count=64 Masked- Vector table: BAR=0 offset=00002000 PBA: BAR=0 offset=00003000 Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- AERCap: First Error Pointer: 04, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+ MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 1 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ IOVSta: Migration- Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00 VF offset: 2, stride: 1, Device ID: 1016 Supported Page Size: 000007ff, System Page Size: 00000001 Region 0: Memory at 00000380ae800000 (64-bit, prefetchable) VF Migration: offset: 00000000, BIR: 0 Capabilities: [1c0 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [230 v1] Access Control Services ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- Kernel driver in use: mlx5_core Kernel modules: mlx5_core root@pve:~# lspci -s 01:00.1 -vvv -nn 01:00.1 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015] Subsystem: Mellanox Technologies ConnectX-4 Lx Stand-up dual-port 10GbE MCX4121A-XCAT [15b3:0004] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 201 IOMMU group: 69 Region 0: Memory at 380aa000000 (64-bit, prefetchable) [size=32M] Expansion ROM at f8b00000 [disabled] [size=1M] Capabilities: [60] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 512 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <4us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (ok), Width x8 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [48] Vital Product Data Product Name: CX4121A - ConnectX-4 LX SFP28 Read-only fields: [PN] Part number: MCX4121A-XCAT [EC] Engineering changes: AM [SN] Serial number: MT2130K05338 [V0] Vendor specific: PCIeGen3 x8 [RV] Reserved: checksum good, 0 byte(s) reserved End Capabilities: [9c] MSI-X: Enable+ Count=64 Masked- Vector table: BAR=0 offset=00002000 PBA: BAR=0 offset=00003000 Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- AERCap: First Error Pointer: 04, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+ MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy- IOVSta: Migration- Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 01 VF offset: 9, stride: 1, Device ID: 1016 Supported Page Size: 000007ff, System Page Size: 00000001 Region 0: Memory at 00000380ae000000 (64-bit, prefetchable) VF Migration: offset: 00000000, BIR: 0 Capabilities: [230 v1] Access Control Services ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- Kernel driver in use: mlx5_core Kernel modules: mlx5_core root@pve:~# cat /proc/partitions major minor #blocks name 8 0 234431064 sda 8 1 1007 sda1 8 2 524288 sda2 8 3 233905735 sda3 253 0 7340032 dm-0 253 1 58458112 dm-1 253 2 1515520 dm-2 253 3 148299776 dm-3 253 4 148299776 dm-4 11 0 1048575 sr0 root@pve:~#
Als nächste Schritte werden wir veranlassen:
Autor: Werner Fischer Werner Fischer arbeitet im Product Management Team von Thomas-Krenn. Er evaluiert dabei neueste Technologien und teilt sein Wissen in Fachartikeln, bei Konferenzen und im Thomas-Krenn Wiki. Bereits 2005 - ein Jahr nach seinem Abschluss des Studiums zu Computer- und Mediensicherheit an der FH Hagenberg - heuerte er beim bayerischen Server-Hersteller an. Als Öffi-Fan nutzt er gerne Bus & Bahn und genießt seinen morgendlichen Spaziergang ins Büro.
|