HP StorageWorks MSA1510i HP StorageWorks 1510i Modular Smart Array installatio - Page 94

Compromised fault tolerance, Recovering from compromised fault tolerance enabling failed LUNs

Page 94 highlights

Compromised fault tolerance Each RAID configuration has inherent limitations on the number of physical hard drive failures that it can tolerate. If more hard drives fail than the fault-tolerance method allows, fault tolerance is compromised. When the MSA determines that the fault tolerance of a LUN is compromised, the LUN is taken offline and subsequent I/O requests are rejected. This is designed to protect the integrity of the LUN, but does require manual intervention to recover or re-enable the LUN. You are likely to lose data, although it can sometimes be recovered. Common causes of compromised fault tolerance include: • More hard drives fail than the LUN can tolerate. For example, in a RAID 5 array, if a hard drive in an array fails while another drive in the array is being rebuilt. If the array has no online spare, any logical drives in this array that are configured with RAID 5 fault tolerance will fail. • A SCSI cable could be broken or disconnected. • A temporary loss of power. For example, if both power supplies are inappropriately connected to the same power source and that power source it interrupted, fault tolerance may be compromised. Recovering from compromised fault tolerance (enabling failed LUNs) If fault tolerance is compromised, inserting replacement hard drives does not improve the condition of the logical unit. The procedure to re-enable or accept a LUN that is unresponsive is performed in the Array Configuration Utility (ACU), Storage Management Utility (SMU), or the MSA Command Line Interface (MSA-CLI). 1. Stop all I/O activity. 2. Turn off the system as described in Powering off the MSA1510i. 3. Check for loose, dirty, broken, or bent cabling and connectors on all devices. 4. Remove and then reinsert all hard drives and controllers. CAUTION: Data can be lost if the hard drives are not firmly reseated. 5. Turn the system on as described in Powering on the MSA. NOTE: In some cases, a marginal hard drive might work again for long enough to allow you to make copies of important files. 94 Operation and management

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160

Compromised fault tolerance
Each RAID con
guration has inherent limitations on the number of physical hard drive failures that it can
tolerate. If more hard drives fail than the fault-tolerance method allows, fault tolerance is compromised.
When the MSA determines that the fault tolerance of a LUN is compromised, the LUN is taken of
ine
and subsequent I/O requests are rejected. This is designed to protect the integrity of the LUN, but does
require manual intervention to recover or re-enable the LUN. You are likely to lose data, although it
can sometimes be recovered.
Common causes of compromised fault tolerance include:
More hard drives fail than the LUN can tolerate.
For example, in a RAID 5 array, if a hard drive in an array fails while another drive in the array is
being rebuilt. If the array has no online spare, any logical drives in this array that are con
gured
with RAID 5 fault tolerance will fail.
A SCSI cable could be broken or disconnected.
A temporary loss of power.
For example, if both power supplies are inappropriately connected to the same power source and that
power source it interrupted, fault tolerance may be compromised.
Recovering from compromised fault tolerance (enabling failed LUNs)
If fault tolerance is compromised, inserting replacement hard drives does not improve the condition of
the logical unit. The procedure to re-enable or accept a LUN that is unresponsive is performed in the
Array Con
guration Utility (ACU), Storage Management Utility (SMU), or the MSA Command Line
Interface (MSA-CLI).
1.
Stop all I/O activity.
2.
Turn off the system as described in
Powering off the MSA1510i
.
3.
Check for loose, dirty, broken, or bent cabling and connectors on all devices.
4.
Remove and then reinsert all hard drives and controllers.
CAUTION:
Data can be lost if the hard drives are not
rmly reseated.
5.
Turn the system on as described in
Powering on the MSA
.
NOTE:
In some cases, a marginal hard drive might work again for long enough to allow you
to make copies of important
les.
94
Operation and management