HP StorageWorks 70 HP StorageWorks 70 Modular Smart Array Enclosure user guide - Page 37

Factors to consider before replacing hard drives

Page 37 highlights

Compromised fault tolerance If more hard drives fail than the fault-tolerance method allows, fault tolerance is compromised, and the logical drive fails. In this case, all requests from the operating system are rejected with unrecoverable errors. You are likely to lose data, although it can sometimes be recovered. One example of a situation in which compromised fault tolerance may occur is when a drive in an array fails while another drive in the array is being rebuilt. If the array has no online spare, any logical drives in this array that are configured with RAID 5 fault tolerance will fail. Compromised fault tolerance can also be caused by non-drive problems, such as a faulty cable or temporary power loss to a storage system. In such cases, you do not need to replace the physical drives. However, you may still have lost data, especially if the system was busy at the time that the problem occurred. Recovering from compromised fault tolerance If fault tolerance is compromised, inserting replacement drives does not improve the condition of the logical volume. Perform the following procedure to recover data: 1. Power down the enclosure (see Powering down). 2. Check for loose, dirty, broken, or bent cabling and connectors on all devices. 3. Power up the enclosure (see Powering up). NOTE: In some cases, a marginal drive is operational long enough to allow backups of important files. 4. Make copies of important data, if possible. 5. Replace any failed drives. Read Factors to consider before replacing hard drives before replacing the failed hard drives. Factors to consider before replacing hard drives In systems that use external data storage, be sure that the server is the first unit to be powered down and the last to be powered back up. Taking this precaution ensures that the system does not erroneously mark the drives as failed when the server is powered up. Before replacing a degraded drive: • Open HP SIM and inspect the Error Counter window for each physical drive in the same array to confirm that no other drives have any errors. For details, see the HP SIM documentation on the Management CD. • Be sure that the array has a current, valid backup. • Use replacement drives that have a capacity at least as great as that of the smallest drive in the array. The controller immediately fails drives that have insufficient capacity. To minimize the likelihood of fatal system errors, take these precautions when removing failed drives: • Do not remove a degraded drive if any other drive in the array is offline (the online LED is off). In this situation, no other drive in the array can be removed without data loss. Exceptions: • When RAID 1+0 is used, drives are mirrored in pairs. Several drives can be in a failed condition simultaneously (and they can all be replaced simultaneously) without data loss, as long as no two failed drives belong to the same mirrored pair. • When RAID 6 with ADG is used, two drives can fail simultaneously (and be replaced simultaneously) without data loss. user guide 37

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55

Compromised fault tolerance
If more hard drives fail than the fault-tolerance method allows, fault tolerance is compromised, and the
logical drive fails. In this case, all requests from the operating system are rejected with unrecoverable
errors. You are likely to lose data, although it can sometimes be recovered.
One example of a situation in which compromised fault tolerance may occur is when a drive in an array
fails while another drive in the array is being rebuilt. If the array has no online spare, any logical drives
in this array that are con
gured with RAID 5 fault tolerance will fail.
Compromised fault tolerance can also be caused by non-drive problems, such as a faulty cable or
temporary power loss to a storage system. In such cases, you do not need to replace the physical
drives. However, you may still have lost data, especially if the system was busy at the time that the
problem occurred.
Recovering from compromised fault tolerance
If fault tolerance is compromised, inserting replacement drives does not improve the condition of the
logical volume. Perform the following procedure to recover data:
1.
Power down the enclosure (see
Powering down
).
2.
Check for loose, dirty, broken, or bent cabling and connectors on all devices.
3.
Power up the enclosure (see
Powering up
).
NOTE:
In some cases, a marginal drive is operational long enough to allow backups of important
les.
4.
Make copies of important data, if possible.
5.
Replace any failed drives. Read
Factors to consider before replacing hard drives
before replacing
the failed hard drives.
Factors to consider before replacing hard drives
In systems that use external data storage, be sure that the server is the
rst unit to be powered down and
the last to be powered back up. Taking this precaution ensures that the system does not erroneously mark
the drives as failed when the server is powered up.
Before replacing a degraded drive:
Open HP SIM and inspect the Error Counter window for each physical drive in the same array
to con
rm that no other drives have any errors. For details, see the HP SIM documentation
on the Management CD.
Be sure that the array has a current, valid backup.
Use replacement drives that have a capacity at least as great as that of the smallest drive in the
array. The controller immediately fails drives that have insuf
cient capacity.
To minimize the likelihood of fatal system errors, take these precautions when removing failed drives:
Do not remove a degraded drive if any other drive in the array is of
ine (the online LED is off). In
this situation, no other drive in the array can be removed without data loss.
Exceptions:
When RAID 1+0 is used, drives are mirrored in pairs. Several drives can be in a failed
condition simultaneously (and they can all be replaced simultaneously) without data loss, as
long as no two failed drives belong to the same mirrored pair.
When RAID 6 with ADG is used, two drives can fail simultaneously (and be replaced
simultaneously) without data loss.
user guide
37