Dell PowerEdge T40 EMC PowerEdge Servers Troubleshooting Guide - Page 86
Preventing problems before they happen and solving punctures after they occur
View all Dell PowerEdge T40 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 86 highlights
1. Discard Preserved Cache, if it exists. 2. Clear foreign configurations, if any. 3. Delete the array. 4. Shift the position of the drives by one. Move Disk 0 to slot 1, Disk 1 to slot 2, and Disk 2 to slot 0. 5. Recreate the array as desired. 6. Perform a Full Initialization of the array (not a Fast Initialization). 7. Perform a Check Consistency on the array. If the Check Consistency completes without errors, you can safely assume that the array is now healthy and the puncture is removed. Data can now be restored to the healthy array. Preventing problems before they happen and solving punctures after they occur Dell's RAID controllers contain a number of features to prevent many types of problems and to handle a variety of errors that do occur. The primary job of a RAID controller is to preserve the integrity of the data contained on its array(s). Even in the more extreme cases of damage (such as punctures), the array's data is often available and the server can remain in production. Part of any maintenance plan should be the proactive maintenance of the RAID arrays. Dell's RAID controllers are highly reliable and very good at managing its arrays without user intervention. Disregarding proper maintenance can cause even the most sophisticated technologies to experience problems over time. There are a number of things that can help maintain the health of arrays, and prevent the majority of data errors, double faults and punctures. It is highly recommended to perform routine and regular maintenance. Proactive maintenance can correct existing errors, and prevent some errors from occurring. It is not possible to prevent all errors from occurring, but most serious errors can be mitigated significantly with proactive maintenance. For storage and RAID subsystems these steps are: • Update drivers and firmware on controllers, hard drives, backplanes and other devices. • Perform routine Check Consistency operations (Dell recommends every 30 days). • Inspect cabling for signs of wear and damage and ensure good connections. • Review logs for indications of problems. This doesn't have to be a high level technical review, but could simply be a cursory view of the logs looking for extremely obvious indications of potential problems. Contact Dell Technical Support with any questions or concerns. Troubleshooting thermal issue Thermal issues can occur due to malfunctioning ambient temperature sensors, malfunctioning fans, dusty heat sinks, and malfunctioning thermal sensors and so on. To resolve the thermal issues: 1. Check the LCD and Embedded System Management (ESM) logs for any additional error messages to identify the faulty component. 2. Ensure that airflow to the machine is not blocked. Placing it in an enclosed area or blocking the air vent, can cause it to overheat. If installed in a rack, ensure that the rack cooling system is working normally. 3. Check for the ambient temperature is within acceptable levels. 4. Check the internal system fans for obstructions and ensure that all fans are spinning properly. Swap any failing fans with a known- good fan for testing. 5. Ensure that all the required shrouds and blanks are installed. 6. Check if all the fans are functioning properly, the heat sink is installed correctly, and thermal grease is applied. Input/Output errors while reseating SAS IOM storage sled on hardware configurations Reseating SAS IOM/ storage sled on the following hardware configurations, setup as Failover clusters with shared storage and multi path enabled, results in IO errors. MX7000 chassis with compute nodes as cluster nodes and MX5016s sled for 86 Troubleshooting hardware issues