Home » Dell Manuals » Servers » Dell PowerEdge T140 » Manual Viewer

Dell PowerEdge T140 EMC PowerEdge Servers Troubleshooting Guide - Page 98

Slicing, RAID puncture, Causes of RAID puncture

View all Dell PowerEdge T140 manuals

Add to My Manuals
Save this manual to your list of manuals

Page 98 highlights

Slicing Configuring multiple RAID arrays across the same set of disks is called Slicing. RAID puncture A RAID puncture is a feature of Dell PowerEdge RAID Controller (PERC) designed to allow the controller to restore the redundancy of the array despite the loss of data caused by a double fault condition. Another name for a RAID puncture is rebuild with errors. When the RAID controller detects a double fault and there is insufficient redundancy to recover the data in the impacted stripe, the controller creates a puncture in that stripe and enables the rebuild to continue. • Any condition that causes data to be inaccessible in the same stripe on more than one drive is a double fault. • Double faults cause the loss of all data within the impacted stripe. • All RAID punctures are double faults but all double faults are NOT RAID punctures. Causes of RAID puncture Without the RAID puncture feature, the array rebuild would fail, and leave the array in a degraded state. In some cases, the failures may cause additional drives to fail, and cause the array to be in a non-functioning offline state. Puncturing an array has no impact on the ability to boot to or access any data on the array. RAID punctures can occur in one of two situations: • Double Fault already exists (Data already lost). Data error on an online drive is propagated (copied) to a rebuilding drive. • Double Fault does not exist (Data is lost when second error occurs). While in a degraded state, if a bad block occurs on an online drive, that LBA is RAID punctured. This advantage of puncturing an array is keeping the system available in production till the redundancy of the array is restored. The data in the affected stripe is lost whether the RAID puncture occurs or not. The primary disadvantage of this method is that while the array has a RAID puncture in it, uncorrectable errors will continue to be encountered whenever the impacted data (if any) is accessed. A RAID puncture can occur in the following three locations: • In blank space that contains no data. That stripe will be inaccessible, but since there is no data in that location, it will have no significant impact. Any attempts to write to a RAID punctured stripe by an OS will fail and data will be written to a different location. • In a stripe that contains data that isn't critical such as a README.TXT file. If the impacted data is not accessed, no errors are generated during normal I/O. Attempts to perform a file system backup will fail to backup any files impacted by a RAID puncture. Performing a Check Consistency or Patrol Read operations will generate Sense code: 3/11/00 for the applicable LBA and/or stripes. • In data space that is accessed. In such a case, the lost data can cause a variety of errors. T he errors can be minor errors that do not adversely impact a production environment. The errors can also be more severe and can prevent the system from booting to an operating system, or cause applications to fail. An array that is RAID punctured will eventually have to be deleted and recreated to eliminate the RAID puncture. This procedure causes all data to be erased. The data would then need to be recreated or restored from backup after the RAID puncture is eliminated. The resolution for a RAID puncture can be scheduled for a time that is more advantageous to needs of the business. If the data within a RAID punctured stripe is accessed, errors will continue to be reported against the affected bad LBAs with no possible correction available. Eventually (this could be minutes, days, weeks, months, and so on), the Bad Block Management (BBM) Table will fill up causing one or more drives to become flagged as predictive failure. As seen in the figure, drive 0 will typically be the drive that gets flagged as predictive failure due to the errors on drive 1 and drive 2 being propagated to it. Drive 0 may actually be working normally, and replacing drive 0 will only cause that replacement to eventually be flagged predictive failure as well. 98 Troubleshooting hardware issues

Section	Page
Dell EMC PowerEdge Servers Troubleshooting Guide	3
Introduction	8
Audience	8
Recommended tools	8
Documentation resources	9
Safety instructions	10
Diagnostic indicators	11
Status LED indicators	11
System health and system ID indicator codes	12
iDRAC Quick Sync 2 indicator codes	12
iDRAC Direct LED indicator codes	13
NIC indicator codes	13
Power supply unit indicator codes	14
Non-redundant power supply unit indicator codes	16
Hard drive indicator codes	17
uSATA SSD indicator codes	18
Internal dual SD module indicator codes	18
Running diagnostics	20
Receiving automated support with SupportAssist	20
PSA/ePSA Diagnostics	20
Running the PSA Diagnostics	20
PSA and ePSA Diagnostics error codes	20
Debugging mini crash dump files using by WinDbg in Windows operating system	37
Troubleshooting hardware issues	42
Troubleshooting system startup failure	42
No bootable device found	42
Troubleshooting external connections	43
Troubleshooting the video subsystem	43
Troubleshooting a USB device	43
Troubleshooting iDRAC Direct - USB XML configuration	44
Troubleshooting iDRAC Direct - Laptop connection	44
Troubleshooting a serial Input Output device	45
Troubleshooting a NIC	45
NIC teaming on a PowerEdge Server	45
Troubleshooting a wet system	46
Troubleshooting a damaged system	46
Troubleshooting the system battery	47
Troubleshooting cooling problems	47
Troubleshooting cooling fans	48
Troubleshooting an internal USB key	48
Troubleshooting a micro SD card	49
Troubleshooting expansion cards	49
Troubleshooting processors	50
Troubleshooting a CPU Machine Check error	50
Troubleshooting a storage controller	51
OMSA flagging PERC driver	51
Importing or clearing foreign configurations using the foreign configuration view screen	51
Importing or clearing foreign configurations using the VD mgmt menu	53
RAID controller L1, L2 and L3 cache error	53
PERC controllers do not support NVME PCIe drives	53
12 Gbps hard drive does not support in SAS 6ir RAID controllers	54
Hard drives cannot be added to the existing RAID 10 Array	54
PERC battery discharging	54
PERC battery failure message is displayed in ESM log	56
Creating non-raid disks for storage purpose	56
Firmware or Physical disks out-of-date	57
Cannot boot to Windows due to foreign configuration	57
Offline or missing virtual drives with preserved cache error message	57
Expanding RAID array	58
LTO-4 Tape drives are not supported on PERC	58
Limitations of HDD size on H310	58
System logs show failure entry for a storage controller even though it is working correctly	58
Troubleshooting hard drives	59
Troubleshooting multiple Drive failure	59
Checking hard drive status in the PERC BIOS	60
FAQs	61
Symptoms	62
Drive timeout error	62
Drives not accessible	63
Troubleshooting an optical drive	63
Troubleshooting a tape backup unit	64
Troubleshooting system memory	64
Correctable memory errors in the system logs	65
Memory errors after system reboots	65
Memory errors after upgrading memory modules	65
Troubleshooting memory module issues	66
Troubleshooting no power issues	69
Troubleshooting power supply units	70
Troubleshooting power source problems	70
Troubleshooting power supply unit problems	70
Troubleshooting RAID	71
RAID configuration using PERC	71
RAID configuration using OpenManage Server Administrator	74
RAID configuration by using Unified Server Configurator	77
Downloading and installing the RAID controller log export by using PERCCLI tool on ESXi hosts on Dell’s 13th generation of PowerEdge servers	80
Configuring RAID by using Lifecycle Controller	84
Starting and target RAID levels for virtual disk reconfiguration and capacity expansion	85
Replacing physical disks in RAID1 configuration	86
Thumb rules for RAID configuration	87
Reconfiguring or migrating virtual disks	87
Foreign Configuration Operations	88
Viewing Patrol Read report	90
Check Consistency report	91
Virtual disk troubleshooting	92
Troubleshooting memory or battery errors on the PERC controller on Dell PowerEdge servers	95
Slicing	98
RAID puncture	98
Troubleshooting thermal issue	100
Server management software issues	101
What are the different types of iDRAC licenses	101
How to activate license on iDRAC	102
Can I upgrade the iDRAC license from express to enterprise and BMC to express	102
How to find out missing licenses	103
How to export license using iDRAC web interface	103
How to set up e-mail alerts	103
System time zone is not synchronized	104
How to set up Auto Dedicated NIC feature	104
How to configure network settings using Lifecycle Controller	104
Assigning hot spare with OMSA	105
Assigning And Unassigning Global Hot Spare	105
Storage Health	106
How do I configure RAID using operating system deployment wizard	106
Foreign drivers on physical disk	107
Importing Foreign Configurations	107
Physical disk reported as Foreign	107
Clearing the foreign configuration	108
Resetting storage-controller configuration	108
How to update BIOS on 13th generation PowerEdge servers	108
Why am I unable to update firmware	108
Which are the operating systems supported on Dell EMC PowerEdge servers	109
Unable to create a partition or locate the partition and unable to install Microsoft Windows Server 2012	109
JAVA support in iDRAC	109
How to specify language and keyboard type	110
Message Event ID - 2405	110
Description	110
Installing Managed System Software On Microsoft Windows Operating Systems	110
Installing Managed System Software On Microsoft Windows Server and Microsoft Hyper-V Server	110
Installing Systems Management Software On VMware ESXi	111
Processor TEMP error	111
PowerEdge T130, R230, R330, and T330 servers may report a critical error during scheduled warm reboots	111
SSD is not detected	111
TRIM/UNMAP and Dell Enterprise SSD Drives Support	111
OpenManage Essentials does not recognize the server	112
Unable to connect to iDRAC port through a switch	112
Lifecycle Controller is not recognizing USB in UEFI mode	112
Guidance on remote desktop services	112
Troubleshooting operating system issues	114
How to install the operating system on a Dell PowerEdge Server	114
Locating the VMware and Windows licensing	114
Troubleshooting blue screen errors or BSODs	114
Troubleshooting a Purple Screen of Death or PSOD	115
Troubleshooting no boot issues for Windows operating systems	115
No boot device found error message is displayed	116
No POST issues in iDRAC	117
“First Boot Device cannot be set” error message is displayed when configuring a boot device during POST.	117
“Alert! iDRAC6 not responding.. Power required may exceed PSU wattage...” error message is displayed at POST during a reboot.	117
Troubleshooting a No POST situation	117
Migrating to OneDrive for Business using Dell Migration Suite for SharePoint	118
Windows	119
Installing and reinstalling Microsoft Windows Server 2016	119
FAQs	121
Symptoms	123
Troubleshooting system crash at cng.sys with watchdog Error violation	123
Host bus adapter mini is missing physical disks and backplane in Windows	124
Converting evaluation OS version to retail OS version	124
Partitions on disk selected for installation of Hyper-V server 2012	124
Install Microsoft Hyper-V Server 2012 R2 with the Internal Dual SD module	125
VMware	125
FAQs	125
Rebooting an ESXi host	126
Unable to allocate storage space to a VM	126
Configuration backup and restore procedures	126
Can we back up 2012 r2 as a VM	127
Install, update and manage Fusion-IO drives in Windows OS	127
Symptoms	128
Linux	128
FAQs	128
Symptoms	128
Installing operating system through various methods	128
Getting help	131
Contacting Dell EMC	131
Downloading the drivers and firmware	131

Match case Limit results 1 per page

Slicing

Conﬁguring

multiple RAID arrays across the same set of disks is called Slicing.

RAID puncture

A RAID puncture is a feature of Dell PowerEdge RAID Controller (PERC) designed to allow the controller to restore the redundancy of the

array despite the loss of data caused by a double fault condition. Another name for a RAID puncture is rebuild with errors. When the RAID

controller detects a double fault and there is

insuﬃcient

redundancy to recover the data in the impacted stripe, the controller creates a

puncture in that stripe and enables the rebuild to continue.

•

Any condition that causes data to be inaccessible in the same stripe on more than one drive is a double fault.

•

Double faults cause the loss of all data within the impacted stripe.

•

All RAID punctures are double faults but all double faults are NOT RAID punctures.

Causes of RAID puncture

Without the RAID puncture feature, the array rebuild would fail, and leave the array in a degraded state. In some cases, the failures may

cause additional drives to fail, and cause the array to be in a non-functioning

oﬄine

state. Puncturing an array has no impact on the ability

to boot to or access any data on the array.

RAID punctures can occur in one of two situations:

•

Double Fault already exists (Data already lost).

Data error on an online drive is propagated (copied) to a rebuilding drive.

•

Double Fault does not exist (Data is lost when second error occurs).

While in a degraded state, if a bad block occurs on an online drive, that LBA is RAID punctured.

This advantage of puncturing an array is keeping the system available in production till the redundancy of the array is restored. The data in

the

aﬀected

stripe is lost whether the RAID puncture occurs or not. The primary disadvantage of this method is that while the array has a

RAID puncture in it, uncorrectable errors will continue to be encountered whenever the impacted data (if any) is accessed.

A RAID puncture can occur in the following three locations:

•

In blank space that contains no data. That stripe will be inaccessible, but since there is no data in that location, it will have no

signiﬁcant

impact. Any attempts to write to a RAID punctured stripe by an OS will fail and data will be written to a

diﬀerent

location.

•

In a stripe that contains data that isn't critical such as a README.TXT

ﬁle.

If the impacted data is not accessed, no errors are generated

during normal I/O. Attempts to perform a

ﬁle

system backup will fail to backup any

ﬁles

impacted by a RAID puncture. Performing a

Check Consistency or Patrol Read operations will generate Sense code: 3/11/00 for the applicable LBA and/or stripes.

•

In data space that is accessed. In such a case, the lost data can cause a variety of errors. T he errors can be minor errors that do not

adversely impact a production environment. The errors can also be more severe and can prevent the system from booting to an

operating system, or cause applications to fail.

An array that is RAID punctured will eventually have to be deleted and recreated to eliminate the RAID puncture. This procedure causes all

data to be erased. The data would then need to be recreated or restored from backup after the RAID puncture is eliminated. The resolution

for a RAID puncture can be scheduled for a time that is more advantageous to needs of the business.

If the data within a RAID punctured stripe is accessed, errors will continue to be reported against the

aﬀected

bad LBAs with no possible

correction available. Eventually (this could be minutes, days, weeks, months, and so on), the Bad Block Management (BBM) Table will

ﬁll

causing one or more drives to become

ﬂagged

as predictive failure. As seen in the

ﬁgure,

drive 0 will typically be the drive that gets

ﬂagged

as predictive failure due to the errors on drive 1 and drive 2 being propagated to it. Drive 0 may actually be working normally, and replacing

drive 0 will only cause that replacement to eventually be

ﬂagged

predictive failure as well.

Troubleshooting hardware issues