HP BL860c Smart Array 6400 Series Controllers for Integrity Servers User Guide - Page 24

Compromised fault tolerance, Recovering from compromised fault tolerance, Replacing hard drives

Page 24 highlights

• RAID 0 configurations cannot tolerate drive failure. If any physical drive in the array fails, all nonfault-tolerant (RAID 0) logical drives in the same array will also fail. • RAID 1+0 configurations can tolerate multiple drive failures as long as no failed drives are mirrored to one another. • RAID 5 configurations can tolerate one drive failure. • RAID ADG configurations can tolerate simultaneous failure of two drives. Compromised fault tolerance If more hard drives fail than the fault-tolerance method allows, fault tolerance is compromised, and the logical drive fails. In this case, all requests from the operating system are rejected with unrecoverable errors. You are likely to lose data, although it can sometimes be recovered (refer to "Recovering from compromised fault tolerance" on page 24). One example of a situation in which compromised fault tolerance may occur is when a drive in an array fails while another drive in the array is being rebuilt. If the array has no online spare, any logical drives in this array that are configured with RAID 5 fault tolerance will fail. Compromised fault tolerance can also be caused by non-drive problems, such as a faulty cable or temporary power loss to a storage system. In such cases, you do not need to replace the physical drives. However, you may still have lost data, especially if the system was busy at the time that the problem occurred. Recovering from compromised fault tolerance If fault tolerance is compromised, inserting replacement drives does not improve the condition of the logical volume. Instead, if the screen displays unrecoverable error messages, perform the following procedure to recover data: 1. Power down the entire system, and then power it back up. In some cases, a marginal drive will work again for long enough to enable you to make copies of important files. If a 1779 POST message is displayed, press the F2 key to re-enable the logical volumes. Remember that data loss has probably occurred and any data on the logical volume is suspect. 2. Make copies of important data, if possible. 3. Replace any failed drives. 4. After you have replaced the failed drives, fault tolerance may again be compromised. If so, cycle the power again. If the 1779 POST message is displayed: a. Press the F2 key to re-enable the logical drives. b. Recreate the partitions. c. Restore all data from backup. To minimize the risk of data loss that is caused by compromised fault tolerance, make frequent backups of all logical volumes. Replacing hard drives The most common reason for replacing a hard drive is that it has failed. However, another reason is to gradually increase the storage capacity of the entire system ("Upgrading hard drive capacity" on page 26). If you insert a hot-pluggable drive into a drive bay while the system power is on, all disk activity in the array pauses while the new drive is spinning up. This spin-up process usually lasts for approximately 20 seconds. When the drive has achieved its normal spin rate, data recovery to the replacement drive begins Replacing, moving, or adding hard drives 24

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42

Replacing, moving, or adding hard drives
24
RAID 0 configurations cannot tolerate drive failure. If any physical drive in the array fails, all non-
fault-tolerant (RAID 0) logical drives in the same array will also fail.
RAID 1+0 configurations can tolerate multiple drive failures as long as no failed drives are mirrored
to one another.
RAID 5 configurations can tolerate one drive failure.
RAID ADG configurations can tolerate simultaneous failure of two drives.
Compromised fault tolerance
If more hard drives fail than the fault-tolerance method allows, fault tolerance is compromised, and the
logical drive fails. In this case, all requests from the operating system are rejected with unrecoverable
errors. You are likely to lose data, although it can sometimes be recovered (refer to "
Recovering from
compromised fault tolerance
" on page
24
).
One example of a situation in which compromised fault tolerance may occur is when a drive in an array
fails while another drive in the array is being rebuilt. If the array has no online spare, any logical drives
in this array that are configured with RAID 5 fault tolerance will fail.
Compromised fault tolerance can also be caused by non-drive problems, such as a faulty cable or
temporary power loss to a storage system. In such cases, you do not need to replace the physical drives.
However, you may still have lost data, especially if the system was busy at the time that the problem
occurred.
Recovering from compromised fault tolerance
If fault tolerance is compromised, inserting replacement drives does not improve the condition of the
logical volume. Instead, if the screen displays unrecoverable error messages, perform the following
procedure to recover data:
1.
Power down the entire system, and then power it back up. In some cases, a marginal drive will work
again for long enough to enable you to make copies of important files.
If a 1779 POST message is displayed, press the
F2
key to re-enable the logical volumes. Remember
that data loss has probably occurred and any data on the logical volume is suspect.
2.
Make copies of important data, if possible.
3.
Replace any failed drives.
4.
After you have replaced the failed drives, fault tolerance may again be compromised. If so, cycle the
power again. If the 1779 POST message is displayed:
a.
Press the
F2
key to re-enable the logical drives.
b.
Recreate the partitions.
c.
Restore all data from backup.
To minimize the risk of data loss that is caused by compromised fault tolerance, make frequent backups of
all logical volumes.
Replacing hard drives
The most common reason for replacing a hard drive is that it has failed. However, another reason is to
gradually increase the storage capacity of the entire system ("
Upgrading hard drive capacity
" on page
26
).
If you insert a hot-pluggable drive into a drive bay while the system power is on, all disk activity in the
array pauses while the new drive is spinning up. This spin-up process usually lasts for approximately 20
seconds. When the drive has achieved its normal spin rate, data recovery to the replacement drive begins