近期苏州天剑服务工程师为用户处置了一个由于HBA卡硬件固件BUG导致的VMware ESXI虚拟化故障,在此苏州天剑服务工程师建议各位用户,在VMware ESXI上线前,请务必完成硬件兼容性检查并更新至满足兼容性列表要求的固件及驱动版本,以最大程度确保业务的运行可靠性,提前规避可能发生的业务风险。
1、VMware兼容性列表查询
https://www.vmware.com/resources/compatibility/search.php
通常情况下我们可以通过搜索硬件型号或手动筛选的方式,找到自己的硬件设备,以查询相应的兼容性要求列表,也可以通过VID、DID号进行更为精准的查询。
2、检查当前固件及驱动版本
以戴尔服务器举例,在完成BIOS及相关固件版本更新后,需检查I/O等设备的固件版本及驱动版本是否满足兼容性要求,如HBA卡、网卡、RAID卡等,在操作之前需要开启ESXI的SSH服务。
2.1、HBA卡
esxcfg-scsidevs -a #查询IDEV插槽的列表信息
esxcli storage san fc list #查询FC的列表信息
vmkchdev -l | grep vmhba[x] #查询HBA卡的VID、DID等信息
[root@localhost:/tmp] esxcfg-scsidevs -a
vmhba0 lsi_mr3 link-n/a sas.5f4ee0806a463900 (0000:65:00.0) Broadcom / LSI PERC H755 Front
vmhba1 vmw_ahci link-n/a sata.vmhba1 (0000:00:11.5) Intel Corporation Lewisburg SATA AHCI Controller
vmhba2 vmw_ahci link-n/a sata.vmhba2 (0000:00:17.0) Intel Corporation Lewisburg SATA AHCI Controller
vmhba3 lpfc link-up fc.200070b7e401f9fe:100070b7e401f9fe (0000:4b:00.0) Emulex Corporation Emulex LightPulse LPe31000/LPe32000 PCIe Fibre Channel Adapter
vmhba4 lpfc link-up fc.200070b7e401f9f8:100070b7e401f9f8 (0000:98:00.0) Emulex Corporation Emulex LightPulse LPe31000/LPe32000 PCIe Fibre Channel Adapter
vmhba64 lpfc link-up fc.200070b7e401f9fe:100070b7e401f9fe (0000:4b:00.0) Emulex Corporation Emulex LightPulse LPe31000/LPe32000 PCIe Fibre Channel Adapter
vmhba65 lpfc link-up fc.200070b7e401f9f8:100070b7e401f9f8 (0000:98:00.0) Emulex Corporation Emulex LightPulse LPe31000/LPe32000 PCIe Fibre Channel Adapter
[root@localhost:/tmp] esxcli software vib list | grep lpfc
lpfc 14.0.622.0-1OEM.700.1.0.15843807 EMU VMwareCertified 2024-01-31
[root@localhost:/tmp] vmkload_mod -s lpfc | grep Version
Version: 14.0.622.0-1OEM.700.1.0.15843807
[root@localhost:/tmp] esxcli storage san fc list
Adapter: vmhba3
Port ID: 0D0500
Node Name: 20:00:70:b7:e4:01:f9:fe
Port Name: 10:00:70:b7:e4:01:f9:fe
Speed: 16 Gbps
Port Type: NPort
Port State: ONLINE
Model Description: Emulex LightPulse LPe31000-M6-D 1-Port 16Gb Fibre Channel Adapter
Hardware Version: 0000000c
OptionROM Version: 14.2.566.14
Firmware Version: 14.2.566.14
Driver Name: lpfc
DriverVersion: 14.0.622.0
Adapter: vmhba4
Port ID: 170500
Node Name: 20:00:70:b7:e4:01:f9:f8
Port Name: 10:00:70:b7:e4:01:f9:f8
Speed: 16 Gbps
Port Type: NPort
Port State: ONLINE
Model Description: Emulex LightPulse LPe31000-M6-D 1-Port 16Gb Fibre Channel Adapter
Hardware Version: 0000000c
OptionROM Version: 14.2.566.14
Firmware Version: 14.2.566.14
Driver Name: lpfc
DriverVersion: 14.0.622.0
Adapter: vmhba64
Port ID: 0D0500
Node Name: 20:00:70:b7:e4:01:f9:fe
Port Name: 10:00:70:b7:e4:01:f9:fe
Speed: 16 Gbps
Port Type: NPort
Port State: ONLINE
Model Description: Emulex LightPulse LPe31000-M6-D 1-Port 16Gb Fibre Channel Adapter
Hardware Version: 0000000c
OptionROM Version: 14.2.566.14
Firmware Version: 14.2.566.14
Driver Name: lpfc
DriverVersion: 14.0.622.0
Adapter: vmhba65
Port ID: 170500
Node Name: 20:00:70:b7:e4:01:f9:f8
Port Name: 10:00:70:b7:e4:01:f9:f8
Speed: 16 Gbps
Port Type: NPort
Port State: ONLINE
Model Description: Emulex LightPulse LPe31000-M6-D 1-Port 16Gb Fibre Channel Adapter
Hardware Version: 0000000c
OptionROM Version: 14.2.566.14
Firmware Version: 14.2.566.14
Driver Name: lpfc
DriverVersion: 14.0.622.0
依据以上查询到的情况来看,该HBA型号为Emulex LightPulse LPe31000-M6-D 1-Port 16Gb Fibre Channel Adapter,固件版本为14.2.566.14,驱动版本为14.0.622.0。
根据兼容性列表查询的结果来看,当前的固件版本为14.2.566.14,所匹配的驱动版本为lpfc version 14.2.567.0,当前版本低于该版本,故需要对HBA的驱动进行升级,后续同理。
2.2、网卡
esxcli network nic list #显示网卡列表信息
esxcli netwrok nic get -n vmnic[x] #显示指定网卡的详细信息
vmkchdev -l | grep vmnicX #显示网卡的VID,DID 等信息
[root@localhost:/tmp] esxcli network nic list
Name PCI Device Driver Admin Status Link Status Speed Duplex MAC Address MTU Description
------ ------------ ------ ------------ ----------- ----- ------ ----------------- ---- -----------
vmnic0 0000:04:00.0 ntg3 Up Down 0 Half ec:2a:72:f9:c3:3c 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic1 0000:04:00.1 ntg3 Up Down 0 Half ec:2a:72:f9:c3:3d 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic2 0000:31:00.0 i40en Up Up 10000 Full 6c:fe:54:8b:22:8c 1500 Intel(R) Ethernet Controller X710 for 10GbE SFP+
vmnic3 0000:31:00.1 i40en Up Up 10000 Full 6c:fe:54:8b:22:8d 1500 Intel(R) Ethernet Controller X710 for 10GbE SFP+
vmnic4 0000:b2:00.0 i40en Up Up 10000 Full 6c:fe:54:82:4d:50 1500 Intel(R) Ethernet Controller X710 for 10GbE SFP+
vmnic5 0000:b2:00.1 i40en Up Up 10000 Full 6c:fe:54:82:4d:51 1500 Intel(R) Ethernet Controller X710 for 10GbE SFP+
vmnic6 0000:b1:00.0 i40en Up Up 10000 Full 6c:fe:54:82:71:90 1500 Intel(R) Ethernet Controller X710 for 10GbE SFP+
vmnic7 0000:b1:00.1 i40en Up Down 0 Half 6c:fe:54:82:71:91 1500 Intel(R) Ethernet Controller X710 for 10GbE SFP+
[root@localhost:/tmp] esxcli network nic get -n vmnic2
Advertised Auto Negotiation: true
Advertised Link Modes: Auto, 10000BaseSR/Full
Auto Negotiation: true
Cable Type: FIBRE
Current Message Level: 0
Driver Info:
Bus Info: 0000:31:00:0
Driver: i40en
Firmware Version: 9.40 0x8000e9bd 22.5.7
Version: 2.3.4.0
Link Detected: true
Link Status: Up
Name: vmnic2
PHYAddress: 0
Pause Autonegotiate: false
Pause RX: false
Pause TX: false
Supported Ports: FIBRE
Supports Auto Negotiation: true
Supports Pause: true
Supports Wakeon: true
Transceiver:
Virtual Address: 00:50:56:5a:59:3e
Wakeon: MagicPacket(tm)
2.3、RAID卡
esxcfg-scsidevs -a #查询IDEV插槽的列表
esxcli storage san sas list #查询RAID卡的详细信息
vmkchdev -l | grep vmhba[x] #查询RAID的VID、DID等信息
[root@localhost:/tmp] esxcli storage san sas list
Device Name: vmhba0
SAS Address: 5f:4e:e0:80:6a:46:39:00
Physical ID: 0
Minimum Link Rate: 0 Mbps
Maximum Link Rate: 0 Mbps
Negotiated Link Rate: 0 Mbps
Model Description: PERC H755 Front
Hardware Version: A
OptionROM Version: 7.26.00.0_0x071A0000
Firmware Version: 52.26.0-5179
Driver Name: lsi_mr3
Driver Version: 7.722.02.00
3、升级操作
仍以戴尔服务器举例,硬件的固件版本可以通过IDRAC进行更新,驱动程序则在VMware官方下载后,上传至/tmp目录下,在升级之前则需将ESXI进入维护模式。
esxcli system maintenanceMode set --enable yes #进入维护模式
[root@localhost:/tmp] esxcli software component apply -d /tmp/Broadcom-lsi-mr3_7.726.02.00-1OEM.700.1.0.15843807_22115906.zip
Installation Result
Components Installed: Broadcom-lsi-mr3_7.726.02.00-1OEM.700.1.0.15843807
Components Removed: Broadcom-lsi-mr3_7.722.02.00-1OEM.700.1.0.15843807
Components Skipped:
Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
Reboot Required: true
[root@localhost:/tmp] esxcli software component apply -d /tmp/Broadcom-ELX-lpfc_14.2.567.0-1OEM.700.1.0.15843807_21768986.zip
Installation Result
Components Installed: Broadcom-ELX-lpfc_14.2.567.0-1OEM.700.1.0.15843807
Components Removed: Broadcom-ELX-lpfc_14.0.622.0-1OEM.700.1.0.15843807
Components Skipped:
Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
Reboot Required: true
[root@localhost:/tmp] esxcli software component apply -d /tmp/Intel-i40en_2.5.11.0-1OEM.700.1.0.15843807_22757618.zip
Installation Result
Components Installed: Intel-i40en_2.5.11.0-1OEM.700.1.0.15843807
Components Removed: Intel-i40en_2.3.4.0-1OEM.700.1.0.15843807
Components Skipped:
Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
Reboot Required: true
所需的驱动程勋全部更新完成后,通过reboot命令对ESXI物理主机进行重启,重启后按照上述步骤确认是否升级为目标版本。
全部升级操作结束后,关闭维护模式即可。
esxcli system maintenanceMode set --enable no #关闭维护模式
另外可以通过以下命令检查ESXI主机当前所处的维护模式状态。
esxcli system maintenanceMode get
4、引用
https://www.dell.com/support/kbdoc/zh-cn/000194101/how-to-install-vmware-vsphere-esxi7-0-drivers
https://blog.csdn.net/jkxiaoshao/article/details/120609303
https://blog.csdn.net/fq3758/article/details/107791920
https://blog.csdn.net/fq3758/article/details/107616042