HP 411508-B21 RAID 6 with HP Advanced Data Guarding technology: a cost-effecti - Page 5

Fault tolerance of RAID schemes, increased likelihood of drive array failure with more hard drives. - smart array controller

Page 5 highlights

Fault tolerance of RAID schemes Often, the terms reliability and fault tolerance are used interchangeably in describing RAID schemes; however, there is a distinction between them. Reliability refers to the likelihood that an individual drive or drive array will continue to function without experiencing a failure. Reliability is typically measured over some period of time. Fault tolerance, on the other hand, is the ability of an array to withstand and recover from a drive failure. Fault tolerance is provided by some sort of redundancy-mirroring, parity, or a combination of both-and it is typically measured by the number of drives that can fail without causing the entire array to fail. The fault tolerance of various RAID levels is as follows: • RAID 0 has no fault tolerance because it provides no type of redundancy. The array will fail if one physical drive fails. • With RAID 1 or RAID 1+0, up to n/2 hard drives can fail without causing array failure ⎯assuming that none of the failed drives are mirrored to each other. In practice, logical drive failure is more likely to occur before n/2 drives fail. The array will fail if a drive and its mirror both fail; however, the probability of this decreases as the number of mirrored pairs increases. • RAID 5 can withstand the failure of only one physical drive. If a second drive should fail before the first failed drive is replaced, the array will fail. Therefore, HP recommends the use of an online spare drive in RAID 5 configurations. With an online spare drive in the array, when a drive fails, a rebuild of the data on the failed drive begins immediately. Therefore, the array will fail only if a second drive should fail during the brief process of rebuilding the data onto the spare drive. • RAID 6 can withstand the failure of two physical drives. Three hard drives must fail before the entire array will fail. RAID 6 also protects against the loss of data if a drive fails and a defect occurs in a single sector of another drive. This is important if data is being rebuilt after a drive failure and a media defect occurs in one of the good drives. Although RAID 1 and RAID 1+0 provide a higher level of fault tolerance than RAID 5, that protection comes at a very high price, because 50 percent of the drives are dedicated to fault protection. For RAID 5 configurations, HP recommends using no more than 14 physical drives per array due to the increased likelihood of drive array failure with more hard drives. For arrays of more than 14 drives, HP recommends RAID 6 for its fault tolerance and storage efficiency. RAID 6 can effectively protect an array containing the maximum number of drives supported by a variety of Smart Array Controllers. Controller specifications are available online from this web page: www.hp.com/products/smartarray. Figure 1 shows the relative probability of logical drive failure for different RAID levels and different logical drive sizes, assuming the array contains no online spares. A logical drive failure is less likely with RAID 6 than with RAID 0, RAID 5, and RAID 1+0. An online spare (hot spare) can be added to any of the fault-tolerant RAID levels to reduce the probability of logical drive failure: As soon as a drive fails, missing data can be automatically rebuilt onto the online spare from parity data. Without an online spare, there is a greater chance of array failure and consequent loss of data, if a subsequent drive failure occurs before the failed drive can be replaced. Data loss is less likely with RAID 6 than with RAID 5 because RAID 6 can sustain failure of two drives. RAID 6 supports online spare drives and Online RAID Level Migration from any other RAID level. 3 3 For more information about online spare drives and Online RAID Level Migration from RAID 1 or RAID 5, refer to the HP Smart Array 6400 Series Controller Support Guide, http://docs.hp.com/en/J6369-90011/index.html 5

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

Fault tolerance of RAID schemes
Often, the terms
reliability
and
fault tolerance
are used interchangeably in describing RAID schemes;
however, there is a distinction between them. Reliability refers to the likelihood that an individual
drive or drive array will continue to function without experiencing a failure. Reliability is typically
measured over some period of time.
Fault tolerance, on the other hand, is the ability of an array to withstand and recover from a drive
failure. Fault tolerance is provided by some sort of redundancy—mirroring, parity, or a combination
of both—and it is typically measured by the number of drives that can fail without causing the entire
array to fail. The fault tolerance of various RAID levels is as follows:
RAID 0 has no fault tolerance because it provides no type of redundancy. The array will fail if one
physical drive fails.
With RAID 1 or RAID 1+0, up to n/2 hard drives can fail without causing array failure
assuming
that none of the failed drives are mirrored to each other. In practice, logical drive failure is more
likely to occur before n/2 drives fail. The array will fail if a drive and its mirror both fail; however,
the probability of this decreases as the number of mirrored pairs increases.
RAID 5 can withstand the failure of only one physical drive. If a second drive should fail before the
first failed drive is replaced, the array will fail. Therefore, HP recommends the use of an online
spare drive in RAID 5 configurations. With an online spare drive in the array, when a drive fails, a
rebuild of the data on the failed drive begins immediately. Therefore, the array will fail only if a
second drive should fail during the brief process of rebuilding the data onto the spare drive.
RAID 6 can withstand the failure of two physical drives. Three hard drives must fail before the entire
array will fail. RAID 6 also protects against the loss of data if a drive fails and a defect occurs in a
single sector of another drive. This is important if data is being rebuilt after a drive failure and a
media defect occurs in one of the good drives.
Although RAID 1 and RAID 1+0 provide a higher level of fault tolerance than RAID 5, that protection
comes at a very high price, because 50 percent of the drives are dedicated to fault protection. For
RAID 5 configurations, HP recommends using no more than 14 physical drives per array due to the
increased likelihood of drive array failure with more hard drives.
For arrays of more than 14 drives, HP recommends RAID 6 for its fault tolerance and storage
efficiency. RAID 6 can effectively protect an array containing the maximum number of drives
supported by a variety of Smart Array Controllers. Controller specifications are available online from
this web page:
www.hp.com/products/smartarray
.
Figure 1 shows the relative probability of logical drive failure for different RAID levels and different
logical drive sizes, assuming the array contains no online spares. A logical drive failure is less likely
with RAID 6 than with RAID 0, RAID 5, and RAID 1+0.
An online spare (hot spare) can be added to any of the fault-tolerant RAID levels to reduce the
probability of logical drive failure: As soon as a drive fails, missing data can be automatically rebuilt
onto the online spare from parity data. Without an online spare, there is a greater chance of array
failure and consequent loss of data, if a subsequent drive failure occurs before the failed drive can be
replaced. Data loss is less likely with RAID 6 than with RAID 5 because RAID 6 can sustain failure of
two drives. RAID 6 supports online spare drives and Online RAID Level Migration from any other
RAID level.
3
3
For more information about online spare drives and Online RAID Level Migration from RAID 1 or RAID 5, refer
to the HP Smart Array 6400 Series Controller Support Guide,
5