Home » Dell Manuals » Servers » Dell PowerEdge T40 » Manual Viewer

Dell PowerEdge T40 EMC PowerEdge Servers Troubleshooting Guide - Page 86

Preventing problems before they happen and solving punctures after they occur

Add to My Manuals
Save this manual to your list of manuals

Page 86 highlights

1. Discard Preserved Cache, if it exists. 2. Clear foreign configurations, if any. 3. Delete the array. 4. Shift the position of the drives by one. Move Disk 0 to slot 1, Disk 1 to slot 2, and Disk 2 to slot 0. 5. Recreate the array as desired. 6. Perform a Full Initialization of the array (not a Fast Initialization). 7. Perform a Check Consistency on the array. If the Check Consistency completes without errors, you can safely assume that the array is now healthy and the puncture is removed. Data can now be restored to the healthy array. Preventing problems before they happen and solving punctures after they occur Dell's RAID controllers contain a number of features to prevent many types of problems and to handle a variety of errors that do occur. The primary job of a RAID controller is to preserve the integrity of the data contained on its array(s). Even in the more extreme cases of damage (such as punctures), the array's data is often available and the server can remain in production. Part of any maintenance plan should be the proactive maintenance of the RAID arrays. Dell's RAID controllers are highly reliable and very good at managing its arrays without user intervention. Disregarding proper maintenance can cause even the most sophisticated technologies to experience problems over time. There are a number of things that can help maintain the health of arrays, and prevent the majority of data errors, double faults and punctures. It is highly recommended to perform routine and regular maintenance. Proactive maintenance can correct existing errors, and prevent some errors from occurring. It is not possible to prevent all errors from occurring, but most serious errors can be mitigated significantly with proactive maintenance. For storage and RAID subsystems these steps are: • Update drivers and firmware on controllers, hard drives, backplanes and other devices. • Perform routine Check Consistency operations (Dell recommends every 30 days). • Inspect cabling for signs of wear and damage and ensure good connections. • Review logs for indications of problems. This doesn't have to be a high level technical review, but could simply be a cursory view of the logs looking for extremely obvious indications of potential problems. Contact Dell Technical Support with any questions or concerns. Troubleshooting thermal issue Thermal issues can occur due to malfunctioning ambient temperature sensors, malfunctioning fans, dusty heat sinks, and malfunctioning thermal sensors and so on. To resolve the thermal issues: 1. Check the LCD and Embedded System Management (ESM) logs for any additional error messages to identify the faulty component. 2. Ensure that airflow to the machine is not blocked. Placing it in an enclosed area or blocking the air vent, can cause it to overheat. If installed in a rack, ensure that the rack cooling system is working normally. 3. Check for the ambient temperature is within acceptable levels. 4. Check the internal system fans for obstructions and ensure that all fans are spinning properly. Swap any failing fans with a known- good fan for testing. 5. Ensure that all the required shrouds and blanks are installed. 6. Check if all the fans are functioning properly, the heat sink is installed correctly, and thermal grease is applied. Input/Output errors while reseating SAS IOM storage sled on hardware configurations Reseating SAS IOM/ storage sled on the following hardware configurations, setup as Failover clusters with shared storage and multi path enabled, results in IO errors. MX7000 chassis with compute nodes as cluster nodes and MX5016s sled for 86 Troubleshooting hardware issues

Section	Page
Dell EMC PowerEdge Servers Troubleshooting Guide	3
Introduction	7
Audience	7
Recommended tools	7
Documentation resources	7
Safety instructions	9
Diagnostic indicators	10
Status LED indicators	10
System health and system ID indicator codes	11
iDRAC Quick Sync 2 indicator codes	11
iDRAC Direct LED indicator codes	12
NIC indicator codes	12
Power supply unit indicator codes	13
Non-redundant power supply unit indicator codes	14
Hard drive indicator codes	15
uSATA SSD indicator codes	16
Internal dual SD module indicator codes	17
Running diagnostics	18
Receiving automated support with SupportAssist	18
PSA/ePSA Diagnostics	18
Running the PSA Diagnostics	18
PSA and ePSA Diagnostics error codes	18
Debugging mini crash dump files using by WinDbg in Windows operating system	34
Troubleshooting hardware issues	38
Troubleshooting system startup failure	38
No bootable device found	38
Troubleshooting external connections	39
Troubleshooting the video subsystem	39
Troubleshooting a USB device	39
Troubleshooting iDRAC Direct - USB XML configuration	40
Troubleshooting iDRAC Direct - Laptop connection	40
Troubleshooting a serial Input Output device	40
Troubleshooting a NIC	41
NIC teaming on a PowerEdge Server	41
Troubleshooting a wet system	41
Troubleshooting a damaged system	42
Troubleshooting the system battery	43
Troubleshooting cooling problems	43
Troubleshooting cooling fans	44
Troubleshooting an internal USB key	44
Troubleshooting a micro SD card	45
Troubleshooting expansion cards	45
Troubleshooting processors	46
Troubleshooting a CPU Machine Check error	46
Troubleshooting a storage controller	47
OMSA flagging PERC driver	47
Importing or clearing foreign configurations using the foreign configuration view screen	47
Importing or clearing foreign configurations using the VD mgmt menu	49
RAID controller L1, L2 and L3 cache error	49
PERC controllers do not support NVME PCIe drives	49
12 Gbps hard drive does not support in SAS 6ir RAID controllers	49
Hard drives cannot be added to the existing RAID 10 Array	50
PERC battery discharging	50
PERC battery failure message is displayed in ESM log	51
Creating non-raid disks for storage purpose	52
Firmware or Physical disks out-of-date	52
Cannot boot to Windows due to foreign configuration	52
Offline or missing virtual drives with preserved cache error message	52
Managing preserved cache	53
Expanding RAID array	53
LTO-4 Tape drives are not supported on PERC	53
Limitations of HDD size on H310	53
System logs show failure entry for a storage controller even though it is working correctly	53
Troubleshooting hard drives	54
Troubleshooting multiple Drive failure	54
Checking hard drive status in the PERC BIOS	55
FAQs	56
Symptoms	57
Drive timeout error	58
Drives not accessible	58
Troubleshooting an optical drive	58
Troubleshooting a tape backup unit	59
Troubleshooting no power issues	59
Troubleshooting power supply units	59
Troubleshooting power source problems	60
Troubleshooting power supply unit problems	60
Troubleshooting RAID	60
RAID configuration using PERC	60
How to create RAID volumes	60
Creating a secured virtual disk	61
Rebuild	61
Rebuilding the physical disks after multiple disks become simultaneously inaccessible	62
Importing a foreign configuration using PERC	62
Configuring hot spare	63
RAID configuration using OpenManage Server Administrator	63
Create Virtual Disk Express Wizard	63
Create virtual disk Advanced Wizard	64
RAID configuration by using Unified Server Configurator	67
Downloading and installing the RAID controller log export by using PERCCLI tool on ESXi hosts on Dell’s 13th generation of PowerEdge servers	69
Configuring RAID by using Lifecycle Controller	72
Starting and target RAID levels for virtual disk reconfiguration and capacity expansion	73
Replacing physical disks in RAID1 configuration	74
Thumb rules for RAID configuration	74
Reconfiguring or migrating virtual disks	75
Starting and target RAID levels for virtual disk reconfiguration and capacity expansion	75
Foreign Configuration Operations	76
Foreign Configuration properties	77
Viewing Patrol Read report	78
Setting Patrol Read mode	78
To set Patrol Read mode	78
Check Consistency report	79
Performing a Check Consistency	79
To locate view Check Consistency report in Storage Management	79
Virtual disk troubleshooting	79
Rebuilding of virtual disk does not work	79
Rebuilding of virtual disk completes with errors	80
Cannot create a virtual disk	80
A virtual disk of minimum size is not visible to Windows Disk Management	80
Virtual disk errors on systems running Linux	80
Problems associated with using the same physical disks for both redundant and nonredundant virtual disks	81
Enable the alarm on PERC 5/E adapter to alert in case of physical disk failures	81
RAID controller displays multibit ECC errors	81
PERC goes offline with an error message	81
Reconfiguring the RAID level and virtual disks	82
Lost shared storage access	82
Troubleshooting memory or battery errors on the PERC controller on Dell PowerEdge servers	82
Interpreting LCD and Embedded Diagnostic event messages	82
Troubleshooting conditions that lead to error message	83
Additional information for troubleshooting memory or battery errors on the PERC controller	83
Slicing	84
RAID puncture	84
Causes of RAID puncture	84
How to fix a RAID puncture	85
Preventing problems before they happen and solving punctures after they occur	86
Troubleshooting thermal issue	86
Input/Output errors while reseating SAS IOM storage sled on hardware configurations	86
Server management software issues	88
What are the different types of iDRAC licenses	88
How to activate license on iDRAC	89
Can I upgrade the iDRAC license from express to enterprise and BMC to express	89
How to find out missing licenses	89
How to export license using iDRAC web interface	90
How to set up e-mail alerts	90
System time zone is not synchronized	90
How to set up Auto Dedicated NIC feature	91
How to configure network settings using Lifecycle Controller	91
Assigning hot spare with OMSA	92
Storage Health	93
How do I configure RAID using operating system deployment wizard	93
Foreign drivers on physical disk	93
Physical disk reported as Foreign	94
Clearing the foreign configuration	94
Resetting storage-controller configuration	94
How to update BIOS on 13th generation PowerEdge servers	94
Why am I unable to update firmware	95
Which are the operating systems supported on Dell EMC PowerEdge servers	95
Unable to create a partition or locate the partition and unable to install Microsoft Windows Server 2012	95
JAVA support in iDRAC	96
How to specify language and keyboard type	96
Message Event ID - 2405	96
Installing Managed System Software On Microsoft Windows Operating Systems	97
Installing Managed System Software On Microsoft Windows Server and Microsoft Hyper-V Server	97
Installing Systems Management Software On VMware ESXi	97
Processor TEMP error	97
PowerEdge T130, R230, R330, and T330 servers may report a critical error during scheduled warm reboots	98
SSD is not detected	98
TRIM/UNMAP and Dell Enterprise SSD Drives Support	98
OpenManage Essentials does not recognize the server	98
Unable to connect to iDRAC port through a switch	99
Lifecycle Controller is not recognizing USB in UEFI mode	99
Guidance on remote desktop services	99
Troubleshooting operating system issues	100
How to install the operating system on a Dell PowerEdge Server	100
Locating the VMware and Windows licensing	100
Troubleshooting blue screen errors or BSODs	100
Troubleshooting a Purple Screen of Death or PSOD	101
Troubleshooting no boot issues for Windows operating systems	101
No boot device found error message is displayed	102
No POST issues in iDRAC	103
Troubleshooting a No POST situation	103
Migrating to OneDrive for Business using Dell Migration Suite for SharePoint	104
Windows	105
Installing and reinstalling Microsoft Windows Server 2016	105
Install Windows Server by using Dell LifeCycle Controller	105
Install Windows Server by using operating system media	106
FAQs	107
Why are the USB keyboard and mouse not detected during the Windows Server 2008 R2 SP1 installation	107
Why does the installation wizard stop responding during the Windows OS installation	108
Why does Windows OS installation using Lifecycle Controller, on PowerEdge Servers fail at times with an error message	108
Why does Windows Server 2008 R2 SP1 display a blank screen in UEFI mode after installation	108
Symptoms	108
Troubleshooting system crash at cng.sys with watchdog Error violation	109
Host bus adapter mini is missing physical disks and backplane in Windows	109
Converting evaluation OS version to retail OS version	109
Partitions on disk selected for installation of Hyper-V server 2012	110
Install Microsoft Hyper-V Server 2012 R2 with the Internal Dual SD module	110
VMware	111
FAQs	111
Rebooting an ESXi host	111
Unable to allocate storage space to a VM	111
Configuration backup and restore procedures	111
Backing up the configuration of your ESXi host	112
Restoring configuration of your ESXi host	112
Can we back up 2012 r2 as a VM	112
Install, update and manage Fusion-IO drives in Windows OS	112
Symptoms	113
Linux	113
FAQs	113
Symptoms	113
Installing operating system through various methods	113
Getting help	115
Contacting Dell EMC	115
Download the drivers and firmware	115

Match case Limit results 1 per page

Discard Preserved Cache, if it exists.

Clear foreign configurations, if any.

Delete the array.

Shift the position of the drives by one.

Move Disk 0 to slot 1, Disk 1 to slot 2, and Disk 2 to slot 0.

Recreate the array as desired.

Perform a Full Initialization of the array (not a Fast Initialization).

Perform a Check Consistency on the array.

If the Check Consistency completes without errors, you can safely assume that the array is now healthy and the

puncture is removed. Data can now be restored to the healthy array.

Preventing problems before they happen and solving punctures after

they occur

Dell's RAID controllers contain a number of features to prevent many types of problems and to handle a variety of errors that do occur.

The primary job of a RAID controller is to preserve the integrity of the data contained on its array(s). Even in the more extreme cases of

damage (such as punctures), the array's data is often available and the server can remain in production. Part of any maintenance plan

should be the proactive maintenance of the RAID arrays. Dell's RAID controllers are highly reliable and very good at managing its arrays

without user intervention. Disregarding proper maintenance can cause even the most sophisticated technologies to experience problems

over time. There are a number of things that can help maintain the health of arrays, and prevent the majority of data errors, double faults

and punctures.

It is highly recommended to perform routine and regular maintenance. Proactive maintenance can correct existing errors, and prevent

some errors from occurring. It is not possible to prevent all errors from occurring, but most serious errors can be mitigated significantly

with proactive maintenance. For storage and RAID subsystems these steps are:

•

Update drivers and firmware on controllers, hard drives, backplanes and other devices.

•

Perform routine Check Consistency operations (Dell recommends every 30 days).

•

Inspect cabling for signs of wear and damage and ensure good connections.

•

Review logs for indications of problems.

This doesn’t have to be a high level technical review, but could simply be a cursory view of the logs looking for extremely obvious

indications of potential problems. Contact Dell Technical Support with any questions or concerns.

Troubleshooting thermal issue

Thermal issues can occur due to malfunctioning ambient temperature sensors, malfunctioning fans, dusty heat sinks, and malfunctioning

thermal sensors and so on.

To resolve the thermal issues:

Check the LCD and Embedded System Management (ESM) logs for any additional error messages to identify the faulty component.

Ensure that airflow to the machine is not blocked. Placing it in an enclosed area or blocking the air vent, can cause it to overheat. If

installed in a rack, ensure that the rack cooling system is working normally.

Check for the ambient temperature is within acceptable levels.

Check the internal system fans for obstructions and ensure that all fans are spinning properly. Swap any failing fans with a known-

good fan for testing.

Ensure that all the required shrouds and blanks are installed.

Check if all the fans are functioning properly, the heat sink is installed correctly, and thermal grease is applied.

Input/Output errors while reseating SAS IOM

storage sled on hardware configurations

Reseating SAS IOM/ storage sled on the following hardware configurations, setup as Failover clusters with shared storage and multi path

enabled, results in IO errors.

MX7000 chassis with compute nodes as cluster nodes and MX5016s sled for

Troubleshooting hardware issues