HP StoreOnce 4430 HP StoreOnce Backup System Concepts and Configuration Guidel - Page 49

Designing for failover, Key Failover FC zoning considerations - price

Page 49 highlights

Designing for failover One node is effectively doing the work of two nodes in the failed over condition. There is some performance degradation but the backup jobs will continue after the autonomic failover. The following best practices apply when designing for autonomic failover support: • The customer must choose whether SLAs will remain the same after failover as they did before failover. If they do, the solution must be sized in advance to only use up to 50% of the available performance. This is to ensure that there is sufficient headroom in system resources so that in the case of failover there is no appreciable degradation in performance after failover and the SLAs are still met • For customers who are more price-conscious and where failover is an "exception condition" the solution can be sized for cost effectiveness. Here most of the available throughput is utilized on the nodes. In this case when failover happens there will be a degradation in performance. The amount of degradation observed will depend on the relative "imbalance" of throughput requirements between the two nodes. This is another reason for keeping both nodes in a couplet as evenly loaded as possible. • Ensure the correct ISV patches/scripts are applied and do a dry run to test the solution. In some cases a post execution script must be added to each and every backup job/policy. The customer can configure which jobs will retry in the event of failover (which is a temporary condition) in order to limit the load on the single remaining node in the couplet by: ◦ Only putting the post execution script to retry the job in the most urgent and important jobs, not all jobs. This is the method for HP Data Protector. ◦ Modifying the "bring device back on line scripts" to only apply to certain drives and robots - those used by the most urgent and important jobs. This is the method for Symantec NetBackup. • Remember replication is also considered as a virtual device within a service set and replication fails over as well as backup devices • For replication failover there are two scenarios: ◦ Replication was not running - that is, failover occurred outside the replication window, in which case replication will start when the replication windows is next open. ◦ If replication was in progress when failover occurred, after failover has completed replication will start again from the last known good checkpoint (about every 10MB of replicated data). • Failback (via CLI or GUI) is a manual process and should be scheduled to occur during a period of inactivity. • Remember all failover related events are recorded in the Event Logs. Key Failover FC zoning considerations The same considerations apply when configuring Fibre Channel as did when configuring the network. Care must be taken to ensure there is no single point of failure in switch or fabric zoning that will negate the failover capabilities of the HP B6200 Backup System and its autonomic failover ability. Conformance to the following rules will help to ensure successful failover • Fibre Channel switches used with HP StoreOnce must support NPIV. For a full list see: nl http://www.hp.com/go/ebs. • Use WWPN zoning (rather than port based). • In a single fabric configuration ensure the equivalent FC ports from each B6200 node in a couplet are presented to the same FC switch, see Scenario 1. Key Failover FC zoning considerations 49

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122

Designing for failover
One node is effectively doing the work of two nodes in the failed over condition. There is some
performance degradation but the backup jobs will continue after the autonomic failover. The
following best practices apply when designing for autonomic failover support:
The customer must choose whether SLAs will remain the same after failover as they did before
failover. If they do, the solution must be sized in advance to only use up to 50% of the available
performance. This is to ensure that there is sufficient headroom in system resources so that in
the case of failover there is no appreciable degradation in performance after failover and the
SLAs are still met
For customers who are more price-conscious and where failover is an “exception condition”
the solution can be sized for cost effectiveness. Here most of the available throughput is utilized
on the nodes. In this case when failover happens there will be a degradation in performance.
The amount of degradation observed will depend on the relative “imbalance” of throughput
requirements between the two nodes. This is another reason for keeping both nodes in a
couplet as evenly loaded as possible.
Ensure the correct ISV patches/scripts are applied and do a dry run to test the solution. In
some cases a post execution script must be added to each and every backup job/policy. The
customer can configure which jobs will retry in the event of failover (which is a temporary
condition) in order to limit the load on the single remaining node in the couplet by:
Only putting the post execution script to retry the job in the most urgent and important
jobs, not all jobs. This is the method for HP Data Protector.
Modifying the “bring device back on line scripts” to only apply to certain drives and
robots – those used by the most urgent and important jobs. This is the method for Symantec
NetBackup.
Remember replication is also considered as a virtual device within a service set and replication
fails over as well as backup devices
For replication failover there are two scenarios:
Replication was not running – that is, failover occurred outside the replication window,
in which case replication will start when the replication windows is next open.
If replication was in progress when failover occurred, after failover has completed
replication will start again from the last known good checkpoint (about every 10MB of
replicated data).
Failback (via CLI or GUI) is a manual process and should be scheduled to occur during a
period of inactivity.
Remember all failover related events are recorded in the Event Logs.
Key Failover FC zoning considerations
The same considerations apply when configuring Fibre Channel as did when configuring the
network. Care must be taken to ensure there is no single point of failure in switch or fabric zoning
that will negate the failover capabilities of the HP B6200 Backup System and its autonomic failover
ability.
Conformance to the following rules will help to ensure successful failover
Fibre Channel switches used with HP StoreOnce must support NPIV. For a full list see:
nl
h
t
tp://w
w
w
.hp
.co
m/go/eb
s
.
Use WWPN zoning (rather than port based).
In a single fabric configuration ensure the equivalent FC ports from each B6200 node in a
couplet are presented to the same FC switch, see Scenario 1.
Key Failover FC zoning considerations
49