HP 418800-B21 HP StorageWorks 70 Modular Smart Array Enclosure User Guide (434 - Page 35

Effects of a disk drive failure, Compromised fault tolerance

Page 35 highlights

• CPQONLIN identifies failed drives in a NetWare environment. For additional information about diagnosing disk drive problems, see the HP ProLiant Servers Troubleshooting Guide. CAUTION: Sometimes, a drive that has previously failed may seem to be operational after the system is power-cycled or, for a hot-pluggable drive, after the drive has been removed and reinserted. However, continued use of such marginal drives may eventually result in data loss. Replace the marginal drive as soon as possible. Effects of a disk drive failure When a disk drive fails, all logical drives that are in the same array are affected. Each logical drive in an array may be using a different fault-tolerance method, so each logical drive can be affected differently. • RAID 0 configurations cannot tolerate drive failure. If any physical drive in the array fails, all non- fault-tolerant (RAID 0) logical drives in the same array also fail. • RAID 1+0 configurations can tolerate multiple drive failures as long as no failed drives are mirrored to one another (with no spares assigned). • RAID 5 configurations can tolerate one drive failure (with no spares assigned). • RAID 6 with ADG configurations can tolerate simultaneous failure of two drives (with no spares assigned). Compromised fault tolerance If more disk drives fail than the fault-tolerance method allows, fault tolerance is compromised, and the logical drive fails. In this case, all requests from the operating system are rejected with unrecoverable errors. You are likely to lose data, although it can sometimes be recovered. One example of a situation in which compromised fault tolerance may occur is when a drive in an array fails while another drive in the array is being rebuilt. If the array has no online spare, any logical drives in this array that are configured with RAID 5 fault tolerance will fail. Compromised fault tolerance can also be caused by non-drive problems, such as a faulty cable or temporary power loss to a storage system. In such cases, you do not need to replace the physical drives. However, you may still have lost data, especially if the system was busy at the time that the problem occurred. Recovering from compromised fault tolerance If fault tolerance is compromised, inserting replacement drives does not improve the condition of the logical volume. Perform the following procedure to recover data: 1. Power down the enclosure (see Powering off disk enclosures). 2. Check for loose, dirty, broken, or bent cabling and connectors on all devices. 70 Modular Smart Array User Guide 35

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60

CPQONLIN identifies failed drives in a NetWare environment.
For additional information about diagnosing disk drive problems, see the
HP ProLiant Servers
Troubleshooting Guide.
CAUTION:
Sometimes, a drive that has previously failed may seem to be operational after the system is
power-cycled or, for a hot-pluggable drive, after the drive has been removed and reinserted. However,
continued use of such marginal drives may eventually result in data loss. Replace the marginal drive
as soon as possible.
Effects of a disk drive failure
When a disk drive fails, all logical drives that are in the same array are affected. Each logical drive
in an array may be using a different fault-tolerance method, so each logical drive can be affected
differently.
RAID 0 configurations cannot tolerate drive failure. If any physical drive in the array fails, all non-
fault-tolerant (RAID 0) logical drives in the same array also fail.
RAID 1+0 configurations can tolerate multiple drive failures as long as no failed drives are mirrored
to one another (with no spares assigned).
RAID 5 configurations can tolerate one drive failure (with no spares assigned).
RAID 6 with ADG configurations can tolerate simultaneous failure of two drives (with no spares
assigned).
Compromised fault tolerance
If more disk drives fail than the fault-tolerance method allows, fault tolerance is compromised, and
the logical drive fails. In this case, all requests from the operating system are rejected with
unrecoverable errors. You are likely to lose data, although it can sometimes be recovered.
One example of a situation in which compromised fault tolerance may occur is when a drive in an
array fails while another drive in the array is being rebuilt. If the array has no online spare, any
logical drives in this array that are configured with RAID 5 fault tolerance will fail.
Compromised fault tolerance can also be caused by non-drive problems, such as a faulty cable or
temporary power loss to a storage system. In such cases, you do not need to replace the physical
drives. However, you may still have lost data, especially if the system was busy at the time that the
problem occurred.
Recovering from compromised fault tolerance
If fault tolerance is compromised, inserting replacement drives does not improve the condition of the
logical volume. Perform the following procedure to recover data:
1.
Power down the enclosure (see
Powering off disk enclosures
).
2.
Check for loose, dirty, broken, or bent cabling and connectors on all devices.
70 Modular Smart Array User Guide
35