HP P4000 HP Smart Array SAS controllers for Integrity servers support guide - Page 111

For example, in the following

sautil

<device_file>

command output excerpt, spare disk

1I:1:10

is being substituted for failed disk

1I:1:11

, which is why the logical drive is in the

RECOVERING

state.

---- LOGICAL DRIVE SUMMARY ---------------------------------------------------

#

RAID

Size

Status

0

1+0

34700 MB

RECOVERING

---- SAS/SATA DEVICE SUMMARY -------------------------------------------------

Location

Ct Enc Bay

WWID

Type

Capacity Status

internal

1I

1

12

0x500000e01117c732

DISK

36.4 GB

OK

N/A

1I

1

11

0x500000e01115c352

N/A

N/A

FAILED

internal

1I

1

10

0x5000c5000032b839

DISK

36.4 GB

SPARE (activated)

internal

1I

1

9

0x5000c5000030b0c5

DISK

36.4 GB

UNASSIGNED

internal

2I

1

16

0x500000e011213482

DISK

36.4 GB

UNASSIGNED

internal

2I

1

15

0x5000c500002084c9

DISK

73.4 GB

UNASSIGNED

internal

2I

1

14

0x5000c5000030b9c9

DISK

36.4 GB

UNASSIGNED

internal

2I

1

13

0x500000e01118a7a2

DISK

36.4 GB

UNASSIGNED

---- SAS/SATA ENCLOSURE SUMMARY ----------------------------------------------

Location

Ct

Enc

Expander_count

Bay_count

SEP_count

internal

1I

1

0

4

1

internal

2I

1

0

4

1

---- LOGICAL DRIVE 0 ---------------------------------------------------------

Logical Drive Device File

...........

c5t0d0

Fault Tolerance Mode

................

RAID 1+0 (Disk Mirroring)

Logical Drive Size

..................

34700 MB

Logical Drive Status

................

OK

# of Participating Physical Disks... 2

Participating Physical Disk(s)

......

Ct:Enc:Bay:WWID

1I:1:12:0x500000e01117c732

1I:1:11:0x500000e01115c352 <-- NOT RESPONDING

Participating Spare Disk(s)

.........

Ct:Enc:Bay:WWID

1I:1:10:0x5000c5000032b839 <-- activated for 1I:1:11:0x500000e01115c352

Stripe Size

.........................

128 KB

Logical Drive Cache Status

..........

cache enabled

Configuration Signature

.............

0xA00148CC

Media Exchange Detected?

............

no

For more information about the

sautil

command, see

“The sautil command” (page 66)

.

Compromised fault tolerance

Compromised fault tolerance commonly occurs when more physical disks have failed than the

fault tolerance method can support. When fault tolerance fails, the logical volume also fails and

unrecoverable disk error messages are returned to the host. Data loss is likely to occur.

For example, suppose one drive fails in an array configured with RAID 5 fault tolerance while

another drive in the same array is still being rebuilt. If the array has no online spare, the logical

drive fails.

Compromised fault tolerance can also be caused by non disk problems, such as temporary power

loss to a storage system or a faulty cable. In such cases, the physical disks do not need to be

replaced. However, data can still be lost, especially if the system is busy when the problem occurs.

Recovering from fault tolerance failures

When fault tolerance has been compromised, inserting replacement disks does not improve the

condition of the logical drive. Instead, if your screen displays unrecoverable error messages,

follow these steps to recover data:

1.

Power off the server, and then power it back on.

In some cases, a marginal drive will work long enough to enable you to make copies of

important files.

2.

Make copies of important data if possible.

Compromised fault tolerance

111

Section	Page
HP Smart Array SAS controllers for Integrity servers support guide	1
Table of Contents	3
1 Controller overview	11
Smart Array P400 controller features	11
Board components and features	11
Smart Array P400 controller board runtime LEDs	12
Smart Array P411 controller features	14
Smart Array P700m controller features	16
Smart Array P800 controller features	18
Board components and features	18
Smart Array P800 controller board runtime LEDs	18
Smart Array P812 controller features	20
Board components and features	20
Smart Array P812 controller board runtime LEDs	20
Battery pack LEDs	22
Flash-Backed Write Cache (FBWC) LEDs	23
Fault management features	23
Fault management in supported RAID configurations	24
Choosing a RAID method	25
2 Installation	27
Installation overview	27
Installation prerequisites	27
Downloading software	28
Installing software	28
Installing the controller offline	29
Adding or replacing a Smart Array controller online	29
Connecting external devices	30
Verifying and updating controller firmware offline	30
Verifying the controller firmware	30
Downloading the firmware update	31
Updating the controller firmware	32
Verifying the firmware update	33
HELP or ?	34
Error messages	34
Verifying and updating enclosure firmware offline	34
Verifying the enclosure firmware	34
Downloading the enclosure firmware	35
Updating the enclosure firmware	35
Verifying the firmware update	36
HELP or ?	37
Verifying the installation	37
Confirming and updating physical disk firmware	38
Determining the Smart Array controller device file	38
Determining the Connector/Enclosure/Bay and firmware version for physical disks	38
Configuring a Smart Array controller as a boot device	41
Planning to install HP-UX on a logical drive	41
Configuring a logical drive offline using ORCA	42
3 Configuration	45
Planning the RAID configuration	45
The saconfig configuration CLI	46
Displaying the Smart Array controller configuration	51
Configuring a logical drive	52
Deleting a logical drive	54
Clearing the logical drive configuration	56
Adding a spare disk drive	56
Deleting a spare disk drive	56
Changing the rebuild priority of a logical drive	57
Specifying the percentage of cache used for read caching	57
Auto-fail missing disks at boot	57
Creating multiple logical drives in an array	57
Performing RAID level migration	57
Performing stripe size migration	58
Extending the capacity of a logical drive	58
Expanding the capacity of an array	58
Changing the expand priority	58
Using ORCA	59
Creating a logical drive	59
Deleting a logical drive	60
Moving disks and arrays to different positions or controllers	61
Prerequisites	62
Moving disks to a different location or controller on the same server	62
Moving disks to a controller on a different server	63
4 Troubleshooting	65
HP Support Tools Manager	65
Event Monitoring Service	65
Offline Diagnostics Environment	65
PCI Error Recovery	66
The sautil command	66
The sautil <device_file> command	68
Logical drive state definitions	81
Physical disk state definitions	82
The sautil <device_file> scan command	83
The sautil <device_file> accept_media_xchg <logical_drive_number> command	83
The sautil <device_file> set_transfer_rate <rate> command	84
The sautil <device_file> run_startup_script command	84
Using sautil to check and update the controller firmware	84
Determining the Smart Array series controller device file	84
Determining the Smart Array series controller firmware version	84
Updating the Smart Array controller firmware online	85
Updating physical disk firmware online	86
Checking and updating SAS storage enclosure firmware online	88
Determining the Smart Array controller device file	88
Determining the physical drive ID and firmware version for SAS storage enclosures	88
Updating SAS storage enclosure firmware	104
5 Support and other resources	107
About this document	107
Intended audience	107
Typographic conventions	107
Related information	107
HP encourages your comments	107
A Physical disk installation and replacement	109
Overview	109
SAS physical disk failure indicators (for internal disks connected to Smart Array controllers)	109
Other ways to identify a failed physical disk	110
Confirming physical disks failures using sautil	110
Compromised fault tolerance	111
Recovering from fault tolerance failures	111
Physical disk replacement	112
Factors to consider before replacing physical disks	113
Automatic data recovery (rebuild)	113
Time required for a rebuild	114
Abnormal termination of a rebuild	114
Case 1: an uncorrectable read error has occurred	114
Case 2: the replacement disk has failed	115
Case 3: another disk in the array has failed	115
B Logical drive failure probability	117
RAID level and probability of drive failure	117
C Power-on Self Test (POST) error codes	119
POST error codes	119
D Electrostatic discharge	125
Handling parts	125
Grounding	125
E Cable kits	127
F Controller specifications	129
G Regulatory compliance notices	133
Federal Communications Commission notice	133
Declaration of conformity for products marked with the FCC logo, United States only	133
Modifications	134
Cables	134
Canadian notice	134
European Union regulatory notice	134
BSMI notice	135
Chinese notice	135
Japanese Class A notice	135
Korean notice	135
Battery replacement notice	136
Taiwan battery recycling notice	136
H Frequently asked questions	137
I Acronyms used in this document	139

HP P4000 HP Smart Array SAS controllers for Integrity servers support guide - Page 111

Compromised fault tolerance, Recovering from fault tolerance failures, sautil, 1I:1:10, RECOVERING

Page 111 highlights