HP P4000 HP Smart Array SAS controllers for Integrity servers support guide - Page 111

Compromised fault tolerance, Recovering from fault tolerance failures, sautil, 1I:1:10, RECOVERING

Page 111 highlights

For example, in the following sautil command output excerpt, spare disk 1I:1:10 is being substituted for failed disk 1I:1:11, which is why the logical drive is in the RECOVERING state. ---- LOGICAL DRIVE SUMMARY # RAID Size Status 0 1+0 34700 MB RECOVERING ---- SAS/SATA DEVICE SUMMARY Location Ct Enc Bay WWID Type Capacity Status internal 1I N/A 1I internal 1I internal 1I internal 2I internal 2I internal 2I internal 2I 1 12 0x500000e01117c732 DISK 1 11 0x500000e01115c352 N/A 1 10 0x5000c5000032b839 DISK 1 9 0x5000c5000030b0c5 DISK 1 16 0x500000e011213482 DISK 1 15 0x5000c500002084c9 DISK 1 14 0x5000c5000030b9c9 DISK 1 13 0x500000e01118a7a2 DISK 36.4 GB N/A 36.4 GB 36.4 GB 36.4 GB 73.4 GB 36.4 GB 36.4 GB OK FAILED SPARE (activated) UNASSIGNED UNASSIGNED UNASSIGNED UNASSIGNED UNASSIGNED ---- SAS/SATA ENCLOSURE SUMMARY Location Ct Enc Expander_count Bay_count SEP_count internal 1I 1 0 internal 2I 1 0 4 1 4 1 ---- LOGICAL DRIVE 0 Logical Drive Device File........... c5t0d0 Fault Tolerance Mode RAID 1+0 (Disk Mirroring) Logical Drive Size 34700 MB Logical Drive Status OK # of Participating Physical Disks... 2 Participating Physical Disk(s)...... Ct:Enc:Bay:WWID 1I:1:12:0x500000e01117c732 1I:1:11:0x500000e01115c352

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142

For example, in the following
sautil
<device_file>
command output excerpt, spare disk
1I:1:10
is being substituted for failed disk
1I:1:11
, which is why the logical drive is in the
RECOVERING
state.
---- LOGICAL DRIVE SUMMARY ---------------------------------------------------
#
RAID
Size
Status
0
1+0
34700 MB
RECOVERING
---- SAS/SATA DEVICE SUMMARY -------------------------------------------------
Location
Ct Enc Bay
WWID
Type
Capacity Status
internal
1I
1
12
0x500000e01117c732
DISK
36.4 GB
OK
N/A
1I
1
11
0x500000e01115c352
N/A
N/A
FAILED
internal
1I
1
10
0x5000c5000032b839
DISK
36.4 GB
SPARE (activated)
internal
1I
1
9
0x5000c5000030b0c5
DISK
36.4 GB
UNASSIGNED
internal
2I
1
16
0x500000e011213482
DISK
36.4 GB
UNASSIGNED
internal
2I
1
15
0x5000c500002084c9
DISK
73.4 GB
UNASSIGNED
internal
2I
1
14
0x5000c5000030b9c9
DISK
36.4 GB
UNASSIGNED
internal
2I
1
13
0x500000e01118a7a2
DISK
36.4 GB
UNASSIGNED
---- SAS/SATA ENCLOSURE SUMMARY ----------------------------------------------
Location
Ct
Enc
Expander_count
Bay_count
SEP_count
internal
1I
1
0
4
1
internal
2I
1
0
4
1
---- LOGICAL DRIVE 0 ---------------------------------------------------------
Logical Drive Device File
...........
c5t0d0
Fault Tolerance Mode
................
RAID 1+0 (Disk Mirroring)
Logical Drive Size
..................
34700 MB
Logical Drive Status
................
OK
# of Participating Physical Disks... 2
Participating Physical Disk(s)
......
Ct:Enc:Bay:WWID
1I:1:12:0x500000e01117c732
1I:1:11:0x500000e01115c352 <-- NOT RESPONDING
Participating Spare Disk(s)
.........
Ct:Enc:Bay:WWID
1I:1:10:0x5000c5000032b839 <-- activated for 1I:1:11:0x500000e01115c352
Stripe Size
.........................
128 KB
Logical Drive Cache Status
..........
cache enabled
Configuration Signature
.............
0xA00148CC
Media Exchange Detected?
............
no
For more information about the
sautil
command, see
“The sautil command” (page 66)
.
Compromised fault tolerance
Compromised fault tolerance commonly occurs when more physical disks have failed than the
fault tolerance method can support. When fault tolerance fails, the logical volume also fails and
unrecoverable disk error messages are returned to the host. Data loss is likely to occur.
For example, suppose one drive fails in an array configured with RAID 5 fault tolerance while
another drive in the same array is still being rebuilt. If the array has no online spare, the logical
drive fails.
Compromised fault tolerance can also be caused by non disk problems, such as temporary power
loss to a storage system or a faulty cable. In such cases, the physical disks do not need to be
replaced. However, data can still be lost, especially if the system is busy when the problem occurs.
Recovering from fault tolerance failures
When fault tolerance has been compromised, inserting replacement disks does not improve the
condition of the logical drive. Instead, if your screen displays unrecoverable error messages,
follow these steps to recover data:
1.
Power off the server, and then power it back on.
In some cases, a marginal drive will work long enough to enable you to make copies of
important files.
2.
Make copies of important data if possible.
Compromised fault tolerance
111