Home » HP Manuals » Storage Devices » HP Surestore E Disk Array XP256 » Manual Viewer

HP Surestore E Disk Array XP256 HP XP P9000 External Storage Access Manager Us - Page 51

Disaster recovery, Main types of failures that can disrupt your system, The basic recovery process

Get HP Surestore E Disk Array XP256 PDF manuals and user guides

View all HP Surestore E Disk Array XP256 manuals

Add to My Manuals
Save this manual to your list of manuals

Page 51 highlights

6 Disaster recovery On-site disasters, such as power supply failures, can disrupt the normal operation of your ESAM system. Being able to quickly identify the type of failure and recover the affected system or component helps to ensure that you can restore high-availability protection for host applications as soon as possible. Main types of failures that can disrupt your system The main types of failures that can disrupt the system are power failures, hardware failures, connection or communication failures, and software failures. These types of failures can cause system components to function improperly or stop functioning. System components typically affected by these types of failures include: • Main control unit (primary storage system) • Service processor (primary or secondary storage system) • Remote control unit (secondary storage system) • Volume pairs • Quorum disks The basic recovery process The basic process for recovering from an on-site disaster is the same, regardless of the type of failure that caused the disruption in the system. The recovery process involves: • Detecting failures • Determining the type of failure • Determining which recovery procedure to use • Completing the recovery procedure. System failure messages The system automatically generates messages that you can use to detect failures and determine the type of failure that occurred. The messages contain information about the type of failure. System information messages (SIM) Path failure messages Generated by the primary and secondary storage systems Generated by the multipath software on the host Detecting failures Detecting failures is the first task in the recovery process. Failure detection is essential because you need to know the type of failure before you can determine which recovery procedure to use. You have two options for detecting failures. You can check to see if failover has occurred and then determine the type of failure that caused it, or you can check to see if failures have occurred by using the SIM and path failure system messages. • "Option 1: Check for failover first" (page 51) • "Option 2: Check for failures only" (page 52) Option 1: Check for failover first You can use status information about the secondary volume and path status information to see if failover occurred. You can do this using RWC, RAID Manager, or multipath software. Main types of failures that can disrupt your system 51

Section	Page
HP XP P9000 External Storage Access Manager User Guide	1
Contents	3
1 External Storage Access Manager overview	7
How ESAM works	7
ESAM components	8
P9500 storage systems	8
Main and remote control units	9
Pair volumes	9
Data paths	9
Quorum disk	9
Multipath software	9
Remote Web Console GUI	10
RAID Manager (RAID Manager)	10
Data replication	10
Failover	10
2 System implementation planning and system requirements	11
The workflow for planning External Storage Access Manager implementation	11
Required hardware	11
Multipath software	12
Storage system requirements	12
Licenses	12
License capacity	12
Pair volume requirements	13
Quorum disk requirements	14
Data path requirements and recommendations	14
Remote Web Console requirements	15
External storage systems	15
Planning failover	16
Preventing unnecessary failover	17
Sharing volumes with other HP software products	17
Cache Partition	18
Cache Residency	19
Performance Monitor	19
LUN Manager	19
Open Volume Management	19
LUN Expansion	19
Configurations with Business Copy volumes	19
Configuring ESAM with Business Copy	19
Configurations with Continuous Access Journal volumes	20
Configuring ESAM with Continuous Access Journal	20
3 System configuration	21
The basic workflow for configuring the system configuration	21
Connecting the hardware components	21
Prerequisites	22
The workflow for connecting the hardware components	22
Installing and configuring software	22
Additional documentation	23
Prerequisites	23
The workflow for installing and configuring External Storage Access Manager	23
Configuring the primary and secondary storage systems	23
Additional documentation	23
Prerequisites	24
Workflow	24
Configuring the quorum disks	24
Prerequisites	24
Procedure	24
Adding the ID for the quorum disk to the storage systems	25
Prerequisites	25
Procedure	25
Configuring host mode options	26
Prerequisites	26
Procedure	26
4 Working with volume pairs	27
Workflow for ESAM volume pairs	27
Reasons for checking pair status	27
When to check pair status?	27
How pair status reflects system events and use	27
What pairs information can you view and where is it?	28
Where to find the information	29
How hosts see volume pairs	29
Checking pair status	29
Pair status values	30
Split types (PSUS status)	32
Suspend types (PSUE status)	32
Volume pair creation	33
Creating an ESAM pair	33
Prerequisites	33
Procedure	33
Verifying host recognition of a new pair	35
Verification	35
How multipath software shows storage serial number for pairs	36
Splitting pairs	36
Prerequisites	36
Procedure	37
Resynchronizing pairs	37
Reverse resynchronization	38
Prerequisites	38
Procedure	38
Releasing a pair	39
Changing Continuous Access Synchronous pairs to ESAM pairs	40
Requirements	40
Procedure	40
Comparison of the RAID Manager commands and Remote Web Console	41
5 System maintenance	43
Applications used to perform maintenance tasks	43
Required Remote Web Console settings	43
Related documentation	43
The different types of maintenance tasks	43
Switching paths using multipath software	43
Discontinuing ESAM operations	43
Quorum disk ID deletion	44
Deleting quorum disk IDs (standard method)	44
Deleting quorum disk IDs by system attribute (forced deletion)	45
Recovery of accidently deleted quorum disks	45
Recovering the disk when the P-VOL was receiving host I/O at deletion	45
Recovering the disk when the S-VOL was receiving host I/O at deletion	46
Planned outages for system components	46
Options for performing the planned outages	46
The procedures for performing planned outages	47
Performing planned outages (quorum disk only)	47
Performing planned outages (primary storage system and quorum disk)	48
Performing planned outages (secondary storage system and quorum disk)	49
Performing planned outages (both storage systems and quorum disk)	50
6 Disaster recovery	51
Main types of failures that can disrupt your system	51
The basic recovery process	51
System failure messages	51
Detecting failures	51
Option 1: Check for failover first	51
Using Remote Web Console to check for failover	52
Using RAID Manager to check for failover	52
Using multipath software to check for failover	52
Option 2: Check for failures only	52
Determining which basic recovery procedures to use	53
Selecting Procedures	53
Recovery from blocked pair volumes	54
Recovering from primary volume failure on the MCU	54
Recovering from secondary volume failure on the MCU	55
Recovering from primary volume failure on the RCU	56
Recovering from secondary volume failure on the RCU	57
Recovery from quorum disk failure	57
Replacement of quorum disks	57
Replacing a quorum disk when the MCU is receiving host I/O	57
Replacing a quorum disk when the RCU is receiving host I/O	58
Recovery from power failure	59
primary storage system recovery	59
Recovering the system when the RCU is receiving host I/O updates	60
Recovering the system when host I/O updates have stopped	60
Secondary system recovery	61
Recovering the system when the P-VOL is receiving host updates	61
Recovering the system when host updates have stopped	61
Recovery from failures using resynchronization	62
Required conditions	62
Determining which resynchronization recovery procedure to use	63
Prerequisites	63
Procedure	64
Recovering primary volume from Business Copy secondary volume	64
Prerequisites	64
Procedure	64
Recovering secondary volume from Continuous Access Journal primary volume	64
Prerequisites	64
Procedure	65
Postrequisites	65
Recovering from path failures	65
Allowing host I/O to an out-of-date S-VOL	66
7 Using ESAM in a cluster system	67
Cluster system architecture	67
Required software	67
Supported cluster software	67
Configuration requirements	68
Configuring the system	68
Disaster recovery in a cluster system	68
Restrictions	69
8 Troubleshooting	70
Potential causes of errors	70
Is there an error messages for every type of failure?	70
Where do you look for error messages?	70
Basic types of troubleshooting procedures	70
Troubleshooting general errors	70
Suspended volume pair troubleshooting	72
The workflow for troubleshooting suspended pairs when using Remote Web Console	72
Troubleshooting suspended pairs when using RAID Manager	73
Location of the RAID Manager operation log file	73
Example log file	73
Related topics	75
Recovery of data stored only in cache memory	75
Pinned track recovery procedures	75
Recovering pinned tracks from volume pair drives	75
Recovering pinned tracks from quorum disks	75
9 Support and other resources	76
Contacting HP	76
Subscription service	76
Documentation feedback	76
Related information	76
HP websites	77
Conventions for storage capacity values	77
Typographic conventions	77
Rack stability	78
A Conventions	79
Business Copy and Snapshot volumes	79
B ESAM GUI reference	80
Pair Operation window	80
Possible VOL Access values for pairs	82
Detailed Information dialog box	83
Paircreate(ESAM) dialog box	85
Pairsplit-r dialog box	87
Pairresync dialog box	87
Pairsplit-S dialog box	88
Quorum Disk Operation window	89
Add Quorum Disk ID dialog box	90
Glossary	92

Match case Limit results 1 per page

6 Disaster recovery

On-site disasters, such as power supply failures, can disrupt the normal operation of your ESAM

system. Being able to quickly identify the type of failure and recover the affected system or

component helps to ensure that you can restore high-availability protection for host applications

as soon as possible.

Main types of failures that can disrupt your system

The main types of failures that can disrupt the system are power failures, hardware failures,

connection or communication failures, and software failures. These types of failures can cause

system components to function improperly or stop functioning.

System components typically affected by these types of failures include:

•

Main control unit (primary storage system)

•

Service processor (primary or secondary storage system)

•

Remote control unit (secondary storage system)

•

Volume pairs

•

Quorum disks

The basic recovery process

The basic process for recovering from an on-site disaster is the same, regardless of the type of

failure that caused the disruption in the system. The recovery process involves:

•

Detecting failures

•

Determining the type of failure

•

Determining which recovery procedure to use

•

Completing the recovery procedure.

System failure messages

The system automatically generates messages that you can use to detect failures and determine

the type of failure that occurred. The messages contain information about the type of failure.

Generated by the primary and secondary storage systems

System information messages (SIM)

Generated by the multipath software on the host

Path failure messages

Detecting failures

Detecting failures is the first task in the recovery process. Failure detection is essential because you

need to know the type of failure before you can determine which recovery procedure to use.

You have two options for detecting failures. You can check to see if failover has occurred and then

determine the type of failure that caused it, or you can check to see if failures have occurred by

using the SIM and path failure system messages.

•

“Option 1: Check for failover first” (page 51)

•

“Option 2: Check for failures only” (page 52)

Option 1: Check for failover first

You can use status information about the secondary volume and path status information to see if

failover occurred. You can do this using RWC, RAID Manager, or multipath software.

Main types of failures that can disrupt your system