HP ProLiant 4500 Compaq ProLiant Cluster HA/F100 and HA/F200 Administrator Gui - Page 71

Manual vs. Automatic Failback, Failover and Failback Policies

Page 71 highlights

2-40 Compaq ProLiant Clusters HA/F100 and HA/F200 Administrator Guide Another example of a direct-connect device is a directly connected mainframe interface. If the first server is directly connected to the mainframe, as through an SDLC (Synchronous Data Link Control) card in the server, there is no way to switch the physical connection to a second server. In a case like this, you may be able to use the client network to access the mainframe using TCP/IP. Since TCP/IP addresses can be configured to fail over, you may be able to reestablish the connection after a switch. However, many mainframe connectivity applications use the Media Access Control (MAC) address that is burned into the NIC to communicate with the server. This would cause a problem because MAC addresses cannot be configured to fail over. Carefully examine the direct-connect devices on each server to determine whether you need to provide alternate solutions outside of what the cluster hardware and software can accomplish. These devices can be considered single points of failure because the cluster components may not be able to provide failover capabilities for them. Manual vs. Automatic Failback Failback is the act of integrating a failed cluster node back into the cluster. Specifically, it brings cluster groups and resources back to their preferred server. MSCS offers automatic and manual failback options. The automatic failback event will occur whenever the preferred server is reintegrated into the cluster. If the reintegration occurs during normal business hours, there may be a slight interruption in service for network clients during the failback process. If the interruption needs to occur in nonpeak hours, be sure to set the failback policy to "Allow" and set the "Between Hours" settings to acceptable values. For full control over when a cluster node is reintegrated, use manual failback by choosing "Prevent" as the failback policy. Many organizations prefer to use manual failback for business-critical clusters. This prevents applications from automatically failing back to a server that has failed, automatically rebooted, and automatically rejoined the cluster before the root cause of the original error has been determined. These terms are described and illustrated in the Group Failover/Failback Policy Worksheet provided in the following section. Failover and Failback Policies In the "Cluster Groups" section of this chapter, you created one or more cluster group definition worksheets (Figure 2-7). For each cluster group defined in the worksheets, you will now determine its failover and failback policies by filling in the Group Failover/Failback Policy worksheet.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186

2-40
Compaq ProLiant Clusters HA/F100 and HA/F200 Administrator Guide
Another example of a direct-connect device is a directly connected mainframe
interface. If the first server is directly connected to the mainframe, as through
an SDLC (Synchronous Data Link Control) card in the server, there is no way
to switch the physical connection to a second server. In a case like this, you
may be able to use the client network to access the mainframe using TCP/IP.
Since TCP/IP addresses can be configured to fail over, you may be able to
reestablish the connection after a switch. However, many mainframe
connectivity applications use the Media Access Control (MAC) address that is
burned into the NIC to communicate with the server. This would cause a
problem because MAC addresses cannot be configured to fail over.
Carefully examine the direct-connect devices on each server to determine
whether you need to provide alternate solutions outside of what the cluster
hardware and software can accomplish. These devices can be considered
single points of failure because the cluster components may not be able to
provide failover capabilities for them.
Manual vs. Automatic Failback
Failback is the act of integrating a failed cluster node back into the cluster.
Specifically, it brings cluster groups and resources back to their preferred
server. MSCS offers automatic and manual failback options. The automatic
failback event will occur whenever the preferred server is reintegrated into the
cluster. If the reintegration occurs during normal business hours, there may be
a slight interruption in service for network clients during the failback process.
If the interruption needs to occur in nonpeak hours, be sure to set the failback
policy to “Allow” and set the “Between Hours” settings to acceptable values.
For full control over when a cluster node is reintegrated, use manual failback
by choosing “Prevent” as the failback policy.
Many organizations prefer to use manual failback for business-critical clusters.
This prevents applications from automatically failing back to a server that has
failed, automatically rebooted, and automatically rejoined the cluster before
the root cause of the original error has been determined.
These terms are described and illustrated in the Group Failover/Failback
Policy Worksheet provided in the following section.
Failover and Failback Policies
In the “Cluster Groups” section of this chapter, you created one or more
cluster group definition worksheets (Figure 2-7). For each cluster group
defined in the worksheets, you will now determine its failover and failback
policies by filling in the Group Failover/Failback Policy worksheet.