Home » HP Manuals » Servers » HP ProLiant 4500 » Manual Viewer

HP ProLiant 4500 Compaq ProLiant Cluster HA/F100 and HA/F200 Administrator Gui - Page 70

Failover of Directly Connected Devices, Failover Threshold and Failover Period

Get HP ProLiant 4500 PDF manuals and user guides

Add to My Manuals
Save this manual to your list of manuals

Page 70 highlights

Designing the Compaq ProLiant Clusters HA/F100 and HA/F200 2-39 Failover Threshold and Failover Period The failover threshold and failover period are similar to the restart values. The failover threshold defines the maximum number of times per failover period that MSCS attempts to fail over a cluster group. If the cluster group exceeds the failover threshold in the allotted failover period, the group is left on its current node, in its current state, whether that is online, offline, or partially online. The failover threshold and failover period prevents a cluster group from bouncing back and forth between servers. If a cluster group is so unstable that it cannot run properly on either cluster node, it will eventually be left in its current state on one of the nodes. The failover threshold and period determine the point at which the decision is made to leave the cluster group in its current state. The following example illustrates the relationship between the restart threshold and period and the failover threshold and period. Assume you have a cluster group (Group1) that is configured to have a preferred server (Server1). If Group1 encounters an event that forces it offline, MSCS attempts to restart the resource. If Group1 cannot be restarted within the limits of the restart threshold and period, MSCS attempts to fail over Group1 to Node2. If the failover threshold for Group1 is set to 10 and the failover period is set to 3 (hours), MSCS will fail over Group1 as many as 10 times in a 3-hour period. If a failure is still forcing Group1 offline after three hours, MSCS will no longer attempt to fail over the group. Failover of Directly Connected Devices Devices that are physically connected to a server cannot move to the other cluster node. Therefore, any applications or resources dependent on these devices may be unable to restart on the other cluster node. Examples of direct-connect devices include printers, mainframe interfaces, modems, fax interfaces, and customized input devices such as bank card readers. For example, if a server is providing print services to users, and the printer is directly connected to the parallel port of the server, there is no way to switch the physical connection to the other server, even though the print queue and spooler can be configured to fail over. The printer should be configured as a true network printer and connected to a hub that is accessible from either cluster node. In the event of a server failure, not only will the print queue and spooler fail over to the other server, but physical access to the printer will be maintained.

Section	Page
Compaq ProLiant Clusters HA/F100 and HA/F200	1
Notice	2
Contents	3
About This Guide	8
Audience	8
Scope	9
Text Conventions	10
Symbols in Text	11
Getting Help	11
Compaq Technical Support	11
Compaq Website	12
Compaq Authorized Reseller	12
Chapter 1: Architecture of the Compaq ProLiant Clusters HA/F100 and HA/F200	13
Overview of Compaq ProLiant Clusters HA/F100 and HA/F200 Components	13
Compaq ProLiant Cluster HA/F100	15
Compaq ProLiant Cluster HA/F200	17
Compaq ProLiant Servers	19
Compaq StorageWorks RAID Array 4000 Storage System	19
Compaq StorageWorks RAID Array 4000	20
Compaq StorageWorks Fibre Channel Storage Hubs	21
Compaq StorageWorks RA4000 Controller	21
Compaq StorageWorks Fibre Channel Host Adapter	22
Gigabit Interface Converter-Shortwave	22
Cables	22
Cluster Interconnect	24
Client Network	24
Private or Public Interconnect	24
Interconnect Adapters	25
Redundant Interconnects	25
Microsoft Software	26
Compaq Software	26
Compaq SmartStart and Support Software CD	27
Compaq Redundancy Manager (Fibre Channel)	28
Compaq Cluster Verification Utility	28
Compaq Insight Manager	29
Compaq Insight Manager XE	30
Compaq Intelligent Cluster Administrator	30
Resources for Application Installation	31
Chapter 2: Designing the Compaq ProLiant Clusters HA/F100 and HA/F200	32
Planning Considerations	33
Cluster Configurations	33
Cluster Groups	39
Reducing Single Points of Failure in the HA/F100 Configuration	44
Enhanced High Availability Features of the HA/F200	53
Capacity Planning	57
Server Capacity	58
Shared Storage Capacity	60
Load Balancing	63
Network Considerations	65
Network Configuration	65
Migrating Network Clients	66
Failover/Failback Planning	68
Performance After Failover	68
MSCS Thresholds and Periods	69
Failover of Directly Connected Devices	70
Manual vs. Automatic Failback	71
Failover and Failback Policies	71
Chapter 3: Setting Up the Compaq ProLiant Clusters HA/F100 and HA/F200	74
Preinstallation Overview	74
Preinstallation Guidelines	76
Installing the Hardware	79
Setting Up the Nodes	79
Setting Up the Compaq StorageWorks Raid Array 4000 Storage System	81
Setting Up a Private Interconnect	83
Setting Up a Public Interconnect	85
Redundant Interconnect	85
Installing the Software	86
Assisted Integration Using SmartStart (Recommended)	86
Manual Installation Using SmartStart	92
Compaq Intelligent Cluster Administrator	95
Installing Compaq Intelligent Cluster Administrator	95
Additional Cluster Verification Steps	96
Verifying the Creation of the Cluster	96
Verifying Node Failover	97
Verifying Network Client Failover	98
Chapter 4: Upgrading the HA/F100 to an HA/F200	100
Preinstallation Overview	100
Materials Required	101
Upgrade Procedures	102
Chapter 5: Managing the Compaq ProLiant Clusters HA/F100 and HA/F200	104
Managing a Cluster Without Interrupting	105
Managing a Cluster in a Degraded	105
Managing Hardware Components	106
Managing Network Clients Connected	106
Managing a Cluster’s Shared Storage	107
Remotely Managing a Cluster	107
Viewing Cluster Events	107
Modifying Physical Cluster Resources	108
Removing Shared Storage System	108
Adding Shared Storage System	108
Adding or Removing Shared Storage Drives	110
Physically Replacing a Cluster Node	112
Backing Up Your Cluster	113
Managing Cluster Performance	114
Compaq Redundancy Manager	115
Changing Paths	116
Other Functions	117
Compaq Insight Manager	118
Cluster-Specific Features of Compaq Insight	119
Compaq Insight Manager XE	120
Cluster Monitor	121
Compaq Intelligent Cluster Administrator	123
Monitoring and Managing an Active Cluster	123
Managing Cluster History	124
Importing and Exporting Cluster Configurations	124
Microsoft Cluster Administrator	125
Chapter 6: Troubleshooting the Compaq ProLiant Troubleshooting the Compaq ProLiant	126
Installation	127
Troubleshooting Node-to-Node Problems	130
Shared Storage	132
Client-to-Cluster Connectivity	137
Cluster Groups and Cluster Resource	141
Troubleshooting Compaq Redundancy Manager	142
Event Logging	142
Informational Messages	142
Warning Message	145
Error Messages	145
Other Potential Problems	147
Appendix A: Cluster Configuration Worksheets	148
Overview	148
Cluster Group Definition Worksheet	149
Shared Storage Capacity Worksheet	150
Group Failover/Failback Policy Worksheet	151
Preinstallation Worksheet	152
Appendix B: Using Compaq Redundancy Manager in a Single-Server Environment	153
Overview	153
Installing Redundancy Manager	156
Automatically Installing Redundancy Manager	156
Manually Installing Redundancy Manager	157
Managing Redundancy Manager	158
Changing Paths	159
Expanding Capacity	160
Other Functions	161
Troubleshooting Redundancy Manager	161
Overview	162
Informational Messages	162
Warning Message	164
Error Messages	165
Troubleshooting Redundancy Manager	168
Troubleshooting Potential Problems	168
Appendix C: Software and Firmware Versions	169
Glossary	171

Match case Limit results 1 per page

Designing the Compaq ProLiant Clusters HA/F100 and HA/F200

2-39

Failover Threshold and Failover Period

The failover threshold and failover period are similar to the restart values. The

failover threshold defines the maximum number of times per failover period

that MSCS attempts to fail over a cluster group. If the cluster group exceeds

the failover threshold in the allotted failover period, the group is left on its

current node, in its current state, whether that is online, offline, or partially

online.

The failover threshold and failover period prevents a cluster group from

bouncing back and forth between servers. If a cluster group is so unstable that

it cannot run properly on either cluster node, it will eventually be left in its

current state on one of the nodes. The failover threshold and period determine

the point at which the decision is made to leave the cluster group in its current

state.

The following example illustrates the relationship between the restart

threshold and period and the failover threshold and period.

Assume you have a cluster group (Group1) that is configured to have a

preferred server (Server1). If Group1 encounters an event that forces it offline,

MSCS attempts to restart the resource. If Group1 cannot be restarted within

the limits of the restart threshold and period, MSCS attempts to fail over

Group1 to Node2. If the failover threshold for Group1 is set to 10 and the

failover period is set to 3 (hours), MSCS will fail over Group1 as many as

10 times in a 3-hour period. If a failure is still forcing Group1 offline after

three hours, MSCS will no longer attempt to fail over the group.

Failover of Directly Connected Devices

Devices that are physically connected to a server cannot move to the other

cluster node. Therefore, any applications or resources dependent on these

devices may be unable to restart on the other cluster node. Examples of

direct-connect devices include printers, mainframe interfaces, modems, fax

interfaces, and customized input devices such as bank card readers.

For example, if a server is providing print services to users, and the printer is

directly connected to the parallel port of the server, there is no way to switch

the physical connection to the other server, even though the print queue and

spooler can be configured to fail over. The printer should be configured as a

true network printer and connected to a hub that is accessible from either

cluster node. In the event of a server failure, not only will the print queue and

spooler fail over to the other server, but physical access to the printer will be

maintained.