Home » Compaq Manuals » Servers » Compaq ProLiant 2500 » Manual Viewer

Compaq ProLiant 2500 Compaq ProLiant Cluster HA/F100 and HA/F200 Administrator - Page 71

Manual vs. Automatic Failback, Failover and Failback Policies

Get Compaq ProLiant 2500 PDF manuals and user guides

View all Compaq ProLiant 2500 manuals

Add to My Manuals
Save this manual to your list of manuals

Page 71 highlights

2-40 Compaq ProLiant Clusters HA/F100 and HA/F200 Administrator Guide Another example of a direct-connect device is a directly connected mainframe interface. If the first server is directly connected to the mainframe, as through an SDLC (Synchronous Data Link Control) card in the server, there is no way to switch the physical connection to a second server. In a case like this, you may be able to use the client network to access the mainframe using TCP/IP. Since TCP/IP addresses can be configured to fail over, you may be able to reestablish the connection after a switch. However, many mainframe connectivity applications use the Media Access Control (MAC) address that is burned into the NIC to communicate with the server. This would cause a problem because MAC addresses cannot be configured to fail over. Carefully examine the direct-connect devices on each server to determine whether you need to provide alternate solutions outside of what the cluster hardware and software can accomplish. These devices can be considered single points of failure because the cluster components may not be able to provide failover capabilities for them. Manual vs. Automatic Failback Failback is the act of integrating a failed cluster node back into the cluster. Specifically, it brings cluster groups and resources back to their preferred server. MSCS offers automatic and manual failback options. The automatic failback event will occur whenever the preferred server is reintegrated into the cluster. If the reintegration occurs during normal business hours, there may be a slight interruption in service for network clients during the failback process. If the interruption needs to occur in nonpeak hours, be sure to set the failback policy to "Allow" and set the "Between Hours" settings to acceptable values. For full control over when a cluster node is reintegrated, use manual failback by choosing "Prevent" as the failback policy. Many organizations prefer to use manual failback for business-critical clusters. This prevents applications from automatically failing back to a server that has failed, automatically rebooted, and automatically rejoined the cluster before the root cause of the original error has been determined. These terms are described and illustrated in the Group Failover/Failback Policy Worksheet provided in the following section. Failover and Failback Policies In the "Cluster Groups" section of this chapter, you created one or more cluster group definition worksheets (Figure 2-7). For each cluster group defined in the worksheets, you will now determine its failover and failback policies by filling in the Group Failover/Failback Policy worksheet.

Section	Page
Compaq ProLiant Clusters HA/F100 and HA/F200	1
Notice	2
Contents	3
About This Guide	8
Audience	8
Scope	9
Text Conventions	10
Symbols in Text	11
Getting Help	11
Compaq Technical Support	11
Compaq Website	12
Compaq Authorized Reseller	12
Chapter 1: Architecture of the Compaq ProLiant Clusters HA/F100 and HA/F200	13
Overview of Compaq ProLiant Clusters HA/F100 and HA/F200 Components	13
Compaq ProLiant Cluster HA/F100	15
Compaq ProLiant Cluster HA/F200	17
Compaq ProLiant Servers	19
Compaq StorageWorks RAID Array 4000 Storage System	19
Compaq StorageWorks RAID Array 4000	20
Compaq StorageWorks Fibre Channel Storage Hubs	21
Compaq StorageWorks RA4000 Controller	21
Compaq StorageWorks Fibre Channel Host Adapter	22
Gigabit Interface Converter-Shortwave	22
Cables	22
Cluster Interconnect	24
Client Network	24
Private or Public Interconnect	24
Interconnect Adapters	25
Redundant Interconnects	25
Microsoft Software	26
Compaq Software	26
Compaq SmartStart and Support Software CD	27
Compaq Redundancy Manager (Fibre Channel)	28
Compaq Cluster Verification Utility	28
Compaq Insight Manager	29
Compaq Insight Manager XE	30
Compaq Intelligent Cluster Administrator	30
Resources for Application Installation	31
Chapter 2: Designing the Compaq ProLiant Clusters HA/F100 and HA/F200	32
Planning Considerations	33
Cluster Configurations	33
Cluster Groups	39
Reducing Single Points of Failure in the HA/F100 Configuration	44
Enhanced High Availability Features of the HA/F200	53
Capacity Planning	57
Server Capacity	58
Shared Storage Capacity	60
Load Balancing	63
Network Considerations	65
Network Configuration	65
Migrating Network Clients	66
Failover/Failback Planning	68
Performance After Failover	68
MSCS Thresholds and Periods	69
Failover of Directly Connected Devices	70
Manual vs. Automatic Failback	71
Failover and Failback Policies	71
Chapter 3: Setting Up the Compaq ProLiant Clusters HA/F100 and HA/F200	74
Preinstallation Overview	74
Preinstallation Guidelines	76
Installing the Hardware	79
Setting Up the Nodes	79
Setting Up the Compaq StorageWorks Raid Array 4000 Storage System	81
Setting Up a Private Interconnect	83
Setting Up a Public Interconnect	85
Redundant Interconnect	85
Installing the Software	86
Assisted Integration Using SmartStart (Recommended)	86
Manual Installation Using SmartStart	92
Compaq Intelligent Cluster Administrator	95
Installing Compaq Intelligent Cluster Administrator	95
Additional Cluster Verification Steps	96
Verifying the Creation of the Cluster	96
Verifying Node Failover	97
Verifying Network Client Failover	98
Chapter 4: Upgrading the HA/F100 to an HA/F200	100
Preinstallation Overview	100
Materials Required	101
Upgrade Procedures	102
Chapter 5: Managing the Compaq ProLiant Clusters HA/F100 and HA/F200	104
Managing a Cluster Without Interrupting	105
Managing a Cluster in a Degraded	105
Managing Hardware Components	106
Managing Network Clients Connected	106
Managing a Cluster’s Shared Storage	107
Remotely Managing a Cluster	107
Viewing Cluster Events	107
Modifying Physical Cluster Resources	108
Removing Shared Storage System	108
Adding Shared Storage System	108
Adding or Removing Shared Storage Drives	110
Physically Replacing a Cluster Node	112
Backing Up Your Cluster	113
Managing Cluster Performance	114
Compaq Redundancy Manager	115
Changing Paths	116
Other Functions	117
Compaq Insight Manager	118
Cluster-Specific Features of Compaq Insight	119
Compaq Insight Manager XE	120
Cluster Monitor	121
Compaq Intelligent Cluster Administrator	123
Monitoring and Managing an Active Cluster	123
Managing Cluster History	124
Importing and Exporting Cluster Configurations	124
Microsoft Cluster Administrator	125
Chapter 6: Troubleshooting the Compaq ProLiant Troubleshooting the Compaq ProLiant	126
Installation	127
Troubleshooting Node-to-Node Problems	130
Shared Storage	132
Client-to-Cluster Connectivity	137
Cluster Groups and Cluster Resource	141
Troubleshooting Compaq Redundancy Manager	142
Event Logging	142
Informational Messages	142
Warning Message	145
Error Messages	145
Other Potential Problems	147
Appendix A: Cluster Configuration Worksheets	148
Overview	148
Cluster Group Definition Worksheet	149
Shared Storage Capacity Worksheet	150
Group Failover/Failback Policy Worksheet	151
Preinstallation Worksheet	152
Appendix B: Using Compaq Redundancy Manager in a Single-Server Environment	153
Overview	153
Installing Redundancy Manager	156
Automatically Installing Redundancy Manager	156
Manually Installing Redundancy Manager	157
Managing Redundancy Manager	158
Changing Paths	159
Expanding Capacity	160
Other Functions	161
Troubleshooting Redundancy Manager	161
Overview	162
Informational Messages	162
Warning Message	164
Error Messages	165
Troubleshooting Redundancy Manager	168
Troubleshooting Potential Problems	168
Appendix C: Software and Firmware Versions	169
Glossary	171

Match case Limit results 1 per page

2-40

Compaq ProLiant Clusters HA/F100 and HA/F200 Administrator Guide

Another example of a direct-connect device is a directly connected mainframe

interface. If the first server is directly connected to the mainframe, as through

an SDLC (Synchronous Data Link Control) card in the server, there is no way

to switch the physical connection to a second server. In a case like this, you

may be able to use the client network to access the mainframe using TCP/IP.

Since TCP/IP addresses can be configured to fail over, you may be able to

reestablish the connection after a switch. However, many mainframe

connectivity applications use the Media Access Control (MAC) address that is

burned into the NIC to communicate with the server. This would cause a

problem because MAC addresses cannot be configured to fail over.

Carefully examine the direct-connect devices on each server to determine

whether you need to provide alternate solutions outside of what the cluster

hardware and software can accomplish. These devices can be considered

single points of failure because the cluster components may not be able to

provide failover capabilities for them.

Manual vs. Automatic Failback

Failback is the act of integrating a failed cluster node back into the cluster.

Specifically, it brings cluster groups and resources back to their preferred

server. MSCS offers automatic and manual failback options. The automatic

failback event will occur whenever the preferred server is reintegrated into the

cluster. If the reintegration occurs during normal business hours, there may be

a slight interruption in service for network clients during the failback process.

If the interruption needs to occur in nonpeak hours, be sure to set the failback

policy to “Allow” and set the “Between Hours” settings to acceptable values.

For full control over when a cluster node is reintegrated, use manual failback

by choosing “Prevent” as the failback policy.

Many organizations prefer to use manual failback for business-critical clusters.

This prevents applications from automatically failing back to a server that has

failed, automatically rebooted, and automatically rejoined the cluster before

the root cause of the original error has been determined.

These terms are described and illustrated in the Group Failover/Failback

Policy Worksheet provided in the following section.

Failover and Failback Policies

In the “Cluster Groups” section of this chapter, you created one or more

cluster group definition worksheets (Figure 2-7). For each cluster group

defined in the worksheets, you will now determine its failover and failback

policies by filling in the Group Failover/Failback Policy worksheet.