Dell PowerEdge R6525 EMC PowerEdge Servers with NVIDIA GPUs and VMware vSphere - Page 16

Known issues and resolution

Page 16 highlights

Known issues and resolution 4 Known issues and resolution This section focuses on the known issues for configuring the GPU features described in this document. 1. PowerEdge R730 with NVIDIA Grid K2 and ESXi 6.x, a Windows 7 64-bit VM configured with vDGA fails to boot and display BSOD. - Resolution: This is a known issue and to overcome the VM crash, set pciPassthru0.msiEnabled is set to False in the VMs VMX file. By default, pciPassthru0.msiEnabled is set to True. 2. VM configured with vGPU fails to start with the following error: The available memory resources in the parent resource pool are insufficient for the operation. - Resolution: Verify the memory assigned to the VM. Ensure that it does not exceed or result in a memory overcommit. 3. VMs configured with vGPU cannot utilize vMotion and DRS functionalities. - Resolution: With versions of ESXi 6.0.x and 6.5.x, the vMotion or similar live operations on VM are not supported. With ESXi 6.7.x, VM configured with vGPU can use vMotion, provided the destination host has the required, supported, and compatible hardware. 4. VM configured with vGPU fails to power on. - Resolution: Ensure that the service X.Org is in a running state on the ESXi host. Operations such as start and stop can be performed either from vSphere Web Client or through SSH to the ESXi host. 5. On the PowerEdge R740 server, after installing the vGPU VIB in ESXi, the command nvidia-smi fails to display the GPU statistics with following error message: Failed to initialize NVML: Unknown Error - Resolution: The above error can occur for many reasons, including misconfiguration. To resolve the issue: 1. Ensure that the VGPU VIB installed successfully without any errors. 2. Verify that the NVIDIA GPUs in the ESXi host are not configured as pass-through devices for VM DirectPath IO or vDGA. 3. Run the command lspci | grep -i nvida on ESXi shell and ensure that there are entries related to NVIDIA GPUs present in the server. 4. On Dell EMC PowerEdge yx4x servers, ensure the below settings in System BIOS are set: • Memory Mapped I/O above 4 GB is set to Enable • Memory Mapped I/O Base is set to 512 GB 6. On the PowerEdge R740 server with NVIDIA Tesla T4, an attempt to configure a VM with an assigned vGPU or to perform a GPU pass-through fails. - Resolution: When the above failure is encountered, verify if Tesla T4 is enumerated as 32 separate GPUs in ESXi. If it is, ensure that the SR-IOV capability is enabled in the server BIOS and retry. 16 Dell EMC PowerEdge Servers with NVIDIA GPUs and VMware vSphere | Technical white paper

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

Known issues and resolution
16
Dell EMC PowerEdge Servers with NVIDIA GPUs and VMware vSphere | Technical white paper
4
Known issues and resolution
This section focuses on the known issues for configuring the GPU features described in this document.
1.
PowerEdge R730 with NVIDIA Grid K2 and ESXi 6.x, a Windows 7 64-bit VM configured with vDGA fails
to boot and display BSOD.
-
Resolution
:
This is a known issue and to overcome the VM crash, set
pciPassthru0.msiEnabled
is set to
False
in
the VMs VMX file. By default,
pciPassthru0.msiEnabled
is set to
True.
2.
VM configured with vGPU fails to start with the following error:
The available memory resources in the parent resource pool are insufficient for the operation.
-
Resolution:
Verify the memory assigned to the VM. Ensure that it does not exceed or result in a memory overcommit.
3.
VMs configured with vGPU cannot utilize vMotion and DRS functionalities.
-
Resolution:
With versions of ESXi 6.0.x and 6.5.x, the vMotion or similar live operations on VM are not supported.
With ESXi 6.7.x, VM configured with vGPU can use vMotion, provided the destination host has the
required, supported, and compatible hardware.
4.
VM configured with vGPU fails to power on.
-
Resolution:
Ensure that the service X.Org is in a running state on the ESXi host. Operations such as start and stop
can be performed either from vSphere Web Client or through SSH to the ESXi host.
5.
On the PowerEdge R740 server, after installing the vGPU VIB in ESXi, the command
nvidia-smi
fails
to display the GPU statistics with following error message:
Failed to initialize NVML: Unknown Error
-
Resolution:
The above error can occur for many reasons, including misconfiguration. To resolve the issue:
1.
Ensure that the VGPU VIB installed successfully without any errors.
2.
Verify that the NVIDIA GPUs in the ESXi host are not configured as pass-through devices for VM
DirectPath IO or vDGA.
3. Run the command
lspci | grep -i nvida
on ESXi shell and ensure that there are entries
related to NVIDIA GPUs present in the server.
4.
On Dell EMC PowerEdge yx4x servers, ensure the below settings in System BIOS are set:
Memory Mapped I/O above 4 GB
is set to Enable
Memory Mapped I/O Base
is set to 512 GB
6.
On the PowerEdge R740 server with NVIDIA Tesla T4, an attempt to configure a VM with an assigned
vGPU or to perform a GPU pass-through fails.
-
Resolution:
When the above failure is encountered, verify if Tesla T4 is enumerated as 32 separate GPUs in ESXi. If
it is, ensure that the SR-IOV capability is enabled in the server BIOS and retry.