HP AD510A HP StorageWorks 1500 Modular Smart Array maintenance and service gui - Page 98

Compromised fault tolerance, Recovering from compromised fault tolerance (enabling failedLUNs)

Page 98 highlights

• RAID 6 configurations can tolerate simultaneous failure of two hard drives in the array. Compromised fault tolerance Each RAID configuration has inherent limitations on the number of physical hard drive failures that it can tolerate. If more hard drives fail than the fault-tolerance method allows, fault tolerance is compromised. When the MSA determines that the fault tolerance of a LUN is compromised, the LUN is taken offline and subsequent I/O requests are rejected. This is designed to protect the integrity of the LUN, but does require manual intervention to recover or re-enable the LUN. You are likely to lose data, although it can sometimes be recovered. Common causes of compromised fault tolerance include: • More hard drives fail than the LUN can tolerate. For example, in a RAID 5 array, if a hard drive in an array fails while another drive in the array is being rebuilt. If the array has no online spare, any logical drives in this array that are configured with RAID 5 fault tolerance will fail. • A SCSI cable could be broken or disconnected. • A temporary loss of power. For example, if both power supplies are inappropriately connected to the same power source and that power source it interrupted, fault tolerance may be compromised. Recovering from compromised fault tolerance (enabling failed LUNs) If fault tolerance is compromised, inserting replacement hard drives does not improve the condition of the logical unit. The procedure to re-enable or accept a LUN that is unresponsive is performed in the Array Configuration Utility (ACU) or the MSA Command Line Interface (MSA-CLI). 1. Stop all I/O activity. 2. Turn off the system as described in Removing power from the MSA. 3. Check for loose, dirty, broken, or bent cabling and connectors on all devices. 4. Remove and then reinsert all hard drives and controllers. CAUTION: Data can be lost if the hard drives are not firmly reseated. 5. Turn the system on as described in Applying power to the MSA. NOTE: In some cases, a marginal hard drive might work again for long enough to allow you to make copies of important files. 6. If using the MSA LCD panel: a. If one of the following messages are displayed on the MSA array controller LCD front panel, an issue was found with one or more configured LUNs that may result in data loss, so all of the hard drives in the LUNs have been disabled. Press the right push button to re-enable the LUNs. 02 ENABLE VOLUME ? ''=YES 04 ENABLE VOLUMES ? ''=YES 98 Hard drive failures and faulted LUNs

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120

RAID 6 con
gurations can tolerate simultaneous failure of two hard drives in the array.
Compromised fault tolerance
Each RAID con
guration has inherent limitations on the number of physical hard drive failures that it can
tolerate. If more hard drives fail than the fault-tolerance method allows, fault tolerance is compromised.
When the MSA determines that the fault tolerance of a LUN is compromised, the LUN is taken of
ine
and subsequent I/O requests are rejected. This is designed to protect the integrity of the LUN, but does
require manual intervention to recover or re-enable the LUN. You are likely to lose data, although it
can sometimes be recovered.
Common causes of compromised fault tolerance include:
More hard drives fail than the LUN can tolerate.
For example, in a RAID 5 array, if a hard drive in an array fails while another drive in the array is
being rebuilt. If the array has no online spare, any logical drives in this array that are con
gured with
RAID 5 fault tolerance will fail.
A SCSI cable could be broken or disconnected.
A temporary loss of power.
For example, if both power supplies are inappropriately connected to the same power source and that
power source it interrupted, fault tolerance may be compromised.
Recovering from compromised fault tolerance (enabling failed
LUNs)
If fault tolerance is compromised, inserting replacement hard drives does not improve the condition of the
logical unit. The procedure to re-enable or accept a LUN that is unresponsive is performed in the Array
Con
guration Utility (ACU) or the MSA Command Line Interface (MSA-CLI).
1.
Stop all I/O activity.
2.
Turn off the system as described in
Removing power from the MSA
.
3.
Check for loose, dirty, broken, or bent cabling and connectors on all devices.
4.
Remove and then reinsert all hard drives and controllers.
CAUTION:
Data can be lost if the hard drives are not
rmly reseated.
5.
Turn the system on as described in
Applying power to the MSA
.
NOTE:
In some cases, a marginal hard drive might work again for long enough to allow you
to make copies of important
les.
6.
If using the MSA LCD panel:
a.
If one of the following messages are displayed on the MSA array controller LCD front panel, an
issue was found with one or more con
gured LUNs that may result in data loss, so all of the
hard drives in the LUNs have been disabled. Press the right push button to re-enable the LUNs.
02 ENABLE VOLUME <n>? '<'=NO, '>'=YES
04 ENABLE VOLUMES ? '<'=NO, '>'=YES
98
Hard drive failures and faulted LUNs