Mellanox ConnectX-4/5/6 Bitfehler ab Linux Kernel 5.13

Aus Thomas-Krenn-Wiki
Zur Navigation springen Zur Suche springen

Bei Linux Systemen mit NVIDIA Mellanox Netzwerkkarten der ConnectX-4/5/6 Serien kommt es mit den Linux Kernel Versionen 5.13.*, 5.14.* und 5.15.0 - 5.15.19 zu Bitfehlern sofern optische Transceiver verwendet werden. Die Ausgabe von ethtool -m [INTERFACE] zeigt eine Laser wavelength von 22560nm anstelle der üblichen 850nm. Konfigurationen mit DAC Kabeln zeigen keine derartigen Probleme. Ursache ist die Treiberänderung net/mlx5: Refactor module EEPROM query[1], welche mit Linux Kernel 5.13.0[2] implementiert wurde. Mit Linux Kernel 5.15.20[3] wurde das Problem mit der Treiberänderung net/mlx5e: Fix module EEPROM query[4] behoben. Potentiell betroffen sind Proxmox VE 7.1 Systeme (Linux Kernel 5.13) sowie Ubuntu 20.04 LTS Systeme mit aktiviertem Ubuntu LTS Hardware Enablement Stack (ebenso Linux Kernel 5.13). Ab Linux Kernel 5.16 ist das Problem jedenfalls behoben, ebenso mit Kernel 5.15.25 und .26. Als Workaround empfehlen wir für Promox VE und Ubuntu Systeme vorerst bei Linux Kernel 5.11 zu bleiben und erst nach Verfügbarkeit eines jeweils zurück portierten Bugfixes auf die neueren Versionen zu wechseln.

Bug Reports

Wir haben die Entwickler von Ubuntu und Proxmox VE über diesen Bug informiert:

Betroffene Systeme

Betroffene Hardware

Testsystem:

  • System: ASUS RS500A-E10-RS12U (BIOS Version 0401)
    • Ebenso betroffen: Supermicro H11DSi-NT sowie X11DPi-NT (BIOS 3.5 mit 1x Intel Xeon Silver 4110 CPU)
  • CPU: AMD EPYC 7443 24-Core Processor
  • RAM: (Information folgt)
  • NVIDIA Mellanox NICs:
    • Mellanox ConnectX-4 Lx dual-port 10GbE MCX4121A-XCAT Netzwerkkarte (Firmware 14.30.1004 und 14.32.1010)
      • FLEXOPTIX P.8596.02 Transceiver
    • Mellanox ConntecX-5 (Genaue Informationen zum Modell und Firmware folgen)
    • Mellanox ConntecX-6 (Genaue Informationen zum Modell und Firmware folgen)

Betroffene Software

  • OS:
    • Proxmox VE mit Linux Kernel 5.13 (mit Linux Kernel 5.11 funktioniert es ohne Probleme)
    • Ubuntu 20.04.4 LTS
      • mit HWE Kernel 5.13.0-30-generic
      • mit Kernel 5.15.0-22-generic (via ppa:canonical-kernel-team/proposed)[5]
    • Debian 11:
      • Mit Kernel 5.10 von Debian: kein Problem
      • Vanilla Kernel 5.12.19: kein Problem
      • Vanilla Kernel 5.13.0: Problem tritt auf
      • Vanilla Kernel 5.15.0: Problem tritt auf
      • Vanilla Kernel 5.15.12: Problem tritt auf
      • Vanilla Kernel 5.15.20: kein Problem -> Problem wird daher vermutlich durch eine der unten angeführten Änderungen beim Longterm Linux Kernel gelöst (5.15.13, 5.15.14, 5.15.17 oder 5.15.20)
      • Vanilla Kernel 5.15.25: kein Problem
      • Vanilla Kernel 5.15.26: kein Problem
      • Vanilla Kernel 5.16.12: kein Problem
      • Vanilla Kernel 5.17.0-rc7: kein Problem

Anmerkung: Bei der Longterm Linux Kernel Version 5.15 gab es bei folgenden Releases Änderungen am mlx5 Treiber:

Workaround

Mit Linux Kernel 5.11 läuft es stabil. Die Probleme könnten durch die umfangreichen Änderungen am mlx5 Treiber mit Kernel 5.12 und 5.13 verursacht worden sein:

Mit Kernel 5.11 zeigt die Ausgabe von ethtool -m auch die korrekte Wellenlänge von 850nm (getestet mit Firmware 14.30.1004):

	Identifier                                : 0x03 (SFP)
	Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
	Connector                                 : 0x07 (LC)
	Transceiver codes                         : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
	Transceiver type                          : 10G Ethernet: 10G Base-SR
	Encoding                                  : 0x06 (64B/66B)
	BR, Nominal                               : 10300MBd
	Rate identifier                           : 0x00 (unspecified)
	Length (SMF,km)                           : 0km
	Length (SMF)                              : 0m
	Length (50um)                             : 80m
	Length (62.5um)                           : 30m
	Length (Copper)                           : 0m
	Length (OM3)                              : 300m
	Laser wavelength                          : 850nm
	Vendor name                               : FLEXOPTIX
	Vendor OUI                                : 38:86:02
	Vendor PN                                 : P.8596.02
	Vendor rev                                : A
	Option values                             : 0x00 0x1a
	Option                                    : RX_LOS implemented
	Option                                    : TX_FAULT implemented
	Option                                    : TX_DISABLE implemented
	BR margin, max                            : 0%
	BR margin, min                            : 0%
	Vendor SN                                 : F79BECY
	Date code                                 : 200108
	Optical diagnostics support               : Yes
	Laser bias current                        : 5.874 mA
	Laser output power                        : 0.5370 mW / -2.70 dBm
	Receiver signal average optical power     : 0.5630 mW / -2.49 dBm
	Module temperature                        : 28.96 degrees C / 84.12 degrees F
	Module voltage                            : 3.2872 V
	Alarm/warning flags implemented           : Yes
	Laser bias current high alarm             : Off
	Laser bias current low alarm              : Off
	Laser bias current high warning           : Off
	Laser bias current low warning            : Off
	Laser output power high alarm             : Off
	Laser output power low alarm              : Off
	Laser output power high warning           : Off
	Laser output power low warning            : Off
	Module temperature high alarm             : Off
	Module temperature low alarm              : Off
	Module temperature high warning           : Off
	Module temperature low warning            : Off
	Module voltage high alarm                 : Off
	Module voltage low alarm                  : Off
	Module voltage high warning               : Off
	Module voltage low warning                : Off
	Laser rx power high alarm                 : Off
	Laser rx power low alarm                  : Off
	Laser rx power high warning               : Off
	Laser rx power low warning                : Off
	Laser bias current high alarm threshold   : 50.000 mA
	Laser bias current low alarm threshold    : 1.000 mA
	Laser bias current high warning threshold : 40.000 mA
	Laser bias current low warning threshold  : 2.000 mA
	Laser output power high alarm threshold   : 1.2589 mW / 1.00 dBm
	Laser output power low alarm threshold    : 0.1175 mW / -9.30 dBm
	Laser output power high warning threshold : 1.0000 mW / 0.00 dBm
	Laser output power low warning threshold  : 0.1479 mW / -8.30 dBm
	Module temperature high alarm threshold   : 90.00 degrees C / 194.00 degrees F
	Module temperature low alarm threshold    : -25.00 degrees C / -13.00 degrees F
	Module temperature high warning threshold : 85.00 degrees C / 185.00 degrees F
	Module temperature low warning threshold  : -20.00 degrees C / -4.00 degrees F
	Module voltage high alarm threshold       : 3.6000 V
	Module voltage low alarm threshold        : 3.0000 V
	Module voltage high warning threshold     : 3.5000 V
	Module voltage low warning threshold      : 3.0500 V
	Laser rx power high alarm threshold       : 1.2589 mW / 1.00 dBm
	Laser rx power low alarm threshold        : 0.0490 mW / -13.10 dBm
	Laser rx power high warning threshold     : 1.0000 mW / 0.00 dBm
	Laser rx power low warning threshold      : 0.0617 mW / -12.10 dBm

Problem

Es ist grundsätzlich ein Link vorhanden, jedoch zeigen Monitoring Tools Bitfehler und die Ausgabe von ethtool -m zeigt eine falsche Wellenläge:

root@pve:~# ethtool -m enp1s0f0np0
        Identifier                                : 0x03 (SFP)
        Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
        Connector                                 : 0x07 (LC)
        Transceiver codes                         : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
        Transceiver type                          : 10G Ethernet: 10G Base-SR
        Encoding                                  : 0x06 (64B/66B)
        BR, Nominal                               : 10300MBd
        Rate identifier                           : 0x00 (unspecified)
        Length (SMF,km)                           : 0km
        Length (SMF)                              : 0m
        Length (50um)                             : 80m
        Length (62.5um)                           : 30m
        Length (Copper)                           : 0m
        Length (OM3)                              : 300m
        Laser wavelength                          : 22560nm
        Vendor name                               : FLEXOPTIX   ____
        Vendor OUI                                : 00:00:00
        Vendor PN                                 : ____g_______FLEX
        Vendor rev                                : OPTI
        Option values                             : 0x03 0x04
        Option                                    : RX_LOS implemented, inverted
        Option                                    : Linear receiver output implemented
        Option                                    : Power level 2 requirement
        BR margin, max                            : 7%
        BR margin, min                            : 16%
        Vendor SN                                 : ________g_______
        Date code                                 : FLEXOPTI
        Optical diagnostics support               : Yes
        Laser bias current                        : 0.000 mA
        Laser output power                        : 0.0030 mW / -25.23 dBm
        Receiver signal average optical power     : 0.0000 mW / -inf dBm
        Module temperature                        : 25.33 degrees C / 77.60 degrees F
        Module voltage                            : 3.3902 V
        Alarm/warning flags implemented           : No

Details zum Testsystem

root@pve:~# dmesg | grep "firmware version"
[    1.718833] mlx5_core 0000:01:00.0: firmware version: 14.32.1010
[    2.053317] mlx5_core 0000:01:00.1: firmware version: 14.32.1010
root@pve:~# uname -a
Linux pve 5.13.19-4-pve #1 SMP PVE 5.13.19-9 (Mon, 07 Feb 2022 11:01:14 +0100) x86_64 GNU/Linux
root@pve:~# modinfo mlx5_core
filename:       /lib/modules/5.13.19-4-pve/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
license:        Dual BSD/GPL
description:    Mellanox 5th generation network adapters (ConnectX series) core driver
author:         Eli Cohen <eli@mellanox.com>
srcversion:     E8D73322E55281A023CCA31
alias:          auxiliary:mlx5_core.eth
alias:          pci:v000015B3d0000A2DCsv*sd*bc*sc*i*
[...]
alias:          pci:v000015B3d00001011sv*sd*bc*sc*i*
alias:          auxiliary:mlx5_core.eth-rep
alias:          auxiliary:mlx5_core.sf
depends:        tls,pci-hyperv-intf,mlxfw,psample
retpoline:      Y
intree:         Y
name:           mlx5_core
vermagic:       5.13.19-4-pve SMP mod_unload modversions
parm:           debug_mask:debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (uint)
parm:           prof_sel:profile selector. Valid range 0 - 2 (uint)
root@pve:~# ethtool -i enp1s0f0np0
driver: mlx5_core
version: 5.13.19-4-pve
firmware-version: 14.32.1010 (MT_2420110004)
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
root@pve:~# ethtool enp1s0f0np0
Settings for enp1s0f0np0:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseKX/Full
                                10000baseKR/Full
        Supported pause frame use: Symmetric
        Supports auto-negotiation: Yes
        Supported FEC modes: None        RS      BASER
        Advertised link modes:  1000baseKX/Full
                                10000baseKR/Full
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Advertised FEC modes: None       RS      BASER
        Speed: 10000Mb/s
        Duplex: Full
        Auto-negotiation: on
        Port: FIBRE
        PHYAD: 0
        Transceiver: internal
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000004 (4)
                               link
        Link detected: yes
root@pve:~# dmidecode -t bios
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 3.3.0 present.

Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
        Vendor: American Megatrends Inc.
        Version: 0401
        Release Date: 11/19/2021
        Address: 0xF0000
        Runtime Size: 64 kB
        ROM Size: 16 MB
        Characteristics:
                PCI is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                Boot from CD is supported
                Selectable boot is supported
                BIOS ROM is socketed
                EDD is supported
                Japanese floppy for NEC 9800 1.2 MB is supported (int 13h)
                Japanese floppy for Toshiba 1.2 MB is supported (int 13h)
                5.25"/360 kB floppy services are supported (int 13h)
                5.25"/1.2 MB floppy services are supported (int 13h)
                3.5"/720 kB floppy services are supported (int 13h)
                3.5"/2.88 MB floppy services are supported (int 13h)
                Print screen service is supported (int 5h)
                Serial services are supported (int 14h)
                Printer services are supported (int 17h)
                CGA/mono video services are supported (int 10h)
                USB legacy is supported
                BIOS boot specification is supported
                Targeted content distribution is supported
                UEFI is supported
        BIOS Revision: 4.1

Handle 0x004E, DMI type 13, 22 bytes
BIOS Language Information
        Language Description Format: Long
        Installable Languages: 1
                en|US|iso8859-1
        Currently Installed Language: en|US|iso8859-1

root@pve:~# lspci -s 01:00.0 -vvv -nn
01:00.0 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
        Subsystem: Mellanox Technologies ConnectX-4 Lx Stand-up dual-port 10GbE MCX4121A-XCAT [15b3:0004]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 98
        IOMMU group: 68
        Region 0: Memory at 380ac000000 (64-bit, prefetchable) [size=32M]
        Expansion ROM at f8c00000 [disabled] [size=1M]
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 512 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <4us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x8 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [48] Vital Product Data
                Product Name: CX4121A - ConnectX-4 LX SFP28
                Read-only fields:
                        [PN] Part number: MCX4121A-XCAT
                        [EC] Engineering changes: AM
                        [SN] Serial number: MT2130K05338
                        [V0] Vendor specific: PCIeGen3 x8
                        [RV] Reserved: checksum good, 0 byte(s) reserved
                End
        Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [c0] Vendor Specific Information: Len=18 <?>
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                AERCap: First Error Pointer: 04, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 1
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration-, Interrupt Message Number: 000
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
                IOVSta: Migration-
                Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00
                VF offset: 2, stride: 1, Device ID: 1016
                Supported Page Size: 000007ff, System Page Size: 00000001
                Region 0: Memory at 00000380ae800000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Capabilities: [1c0 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [230 v1] Access Control Services
                ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Kernel driver in use: mlx5_core
        Kernel modules: mlx5_core

root@pve:~# lspci -s 01:00.1 -vvv -nn
01:00.1 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
        Subsystem: Mellanox Technologies ConnectX-4 Lx Stand-up dual-port 10GbE MCX4121A-XCAT [15b3:0004]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin B routed to IRQ 201
        IOMMU group: 69
        Region 0: Memory at 380aa000000 (64-bit, prefetchable) [size=32M]
        Expansion ROM at f8b00000 [disabled] [size=1M]
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 512 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <4us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x8 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [48] Vital Product Data
                Product Name: CX4121A - ConnectX-4 LX SFP28
                Read-only fields:
                        [PN] Part number: MCX4121A-XCAT
                        [EC] Engineering changes: AM
                        [SN] Serial number: MT2130K05338
                        [V0] Vendor specific: PCIeGen3 x8
                        [RV] Reserved: checksum good, 0 byte(s) reserved
                End
        Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [c0] Vendor Specific Information: Len=18 <?>
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                AERCap: First Error Pointer: 04, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration-, Interrupt Message Number: 000
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy-
                IOVSta: Migration-
                Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 01
                VF offset: 9, stride: 1, Device ID: 1016
                Supported Page Size: 000007ff, System Page Size: 00000001
                Region 0: Memory at 00000380ae000000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Capabilities: [230 v1] Access Control Services
                ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Kernel driver in use: mlx5_core
        Kernel modules: mlx5_core

root@pve:~# cat /proc/partitions
major minor  #blocks  name

   8        0  234431064 sda
   8        1       1007 sda1
   8        2     524288 sda2
   8        3  233905735 sda3
 253        0    7340032 dm-0
 253        1   58458112 dm-1
 253        2    1515520 dm-2
 253        3  148299776 dm-3
 253        4  148299776 dm-4
  11        0    1048575 sr0
root@pve:~#

Nächste Schritte

Als nächste Schritte werden wir veranlassen:

  • Erstellen eines Bug-Reports bei Ubuntu mit der Information, dass Linux Kernel Versionen 5.13, 5.14 und 5.15 (Ubuntu Kernel 5.15.0-22-generic via ppa:canonical-kernel-team/proposed) von diesem Problem betroffen sind. Linux Kernel 5.16 löst das Problem. Da jedoch Linux Kernel 5.15 für Ubuntu 22.04 kommen wird, sollte der Bugfix zurück portiert werden, konkret auf:
    • Linux Kernel 5.13 (aktueller HWE Kernel für Ubuntu 20.04 LTS)
    • Linux Kernel 5.15 (GA Kernel für Ubuntu 22.04 LTS und künftiger HWE Kernel für Ubuntu 20.04 LTS) - z.B. durch Basis Vanilla Kernel 5.15.26

Einzelnachweise

  1. net/mlx5: Refactor module EEPROM query (git.kernel.org, 11.04.2021)
  2. ChangeLog-5.13 (cdn.kernel.org) [...]net/mlx5: Refactor module EEPROM query[...]
  3. ChangeLog-5.15.20 (cdn.kernel.org) [...]net/mlx5e: Fix module EEPROM query[...]
  4. net/mlx5e: Fix module EEPROM query (git.kernel.org, 05.02.2022)
  5. Install/Upgrade Linux Kernel 5.15 on Ubuntu 20.04 LTS (www.linuxcapable.com, 08.02.2022)


Foto Werner Fischer.jpg

Autor: Werner Fischer

Werner Fischer arbeitet im Product Management Team von Thomas-Krenn. Er evaluiert dabei neueste Technologien und teilt sein Wissen in Fachartikeln, bei Konferenzen und im Thomas-Krenn Wiki. Bereits 2005 - ein Jahr nach seinem Abschluss des Studiums zu Computer- und Mediensicherheit an der FH Hagenberg - heuerte er beim bayerischen Server-Hersteller an. Als Öffi-Fan nutzt er gerne Bus & Bahn und genießt seinen morgendlichen Spaziergang ins Büro.


Das könnte Sie auch interessieren

Broadcom NICCLI Configuration Utility
Firmware Update Broadcom Netzwerkkarte
Linkspeed-Konfiguration von Broadcom Netzwerkkarten